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Differential privacy is a promising privacy-preserving paradigm for statistical query processing over sensitive data. It works 
by injecting random noise into each query result, such that it is provably hard for the adversary to infer the presence or absence 
of any individual record from the published noisy results. The main objective in differentially private query processing is to 
maximize the accuracy of the query results, while satisfying the privacy guarantees. Previous work, notably E et al. 20101 . 
has suggested that with an appropriate strategy, processing a batch of correlated queries as a whole achieves considerably 
higher accuracy than answering them individually. However, to our knowledge there is currently no practical solution to find 
such a strategy for an ai'bitrary query batch; existing methods either return strategies of poor quality (often worse than naive 
methods) or require prohibitively expensive computations for even moderately large domains. Motivated by this, we propose 
low-rank mechanism (LRM), the first practical differentially private technique for answering batch linear queries with high 
accuracy. LRM works for both exact (i.e., e-) and approximate (i.e., (e, S)-) differential privacy definitions. We derive the 
utility guarantees of LRM, and provide guidance on how to set the privacy parameters given the user’s utility expectation. 
Extensive experiments using real data demonstrate that our proposed method consistently outperforms state-of-the-art query 
processing solutions under differential privacy, by large margins. 
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1. INTRODUCTION 

Differential privacy ODwork et al. 2006cll is an emerging paradigm for publishing statistical 
information over sensitive data, with strong and rigorous guarantees on individuals’ privacy. 
Since its proposal, differential privacy has attracted extensive research efforts, such as in 
cryptography ODwork et al. 2006cll . algorithms ODwork et al. 20101 IHardt and Talwar 20101 
McSherry and Talwar 200'^, 
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data mining OBhaskar et al. 20101 [ 


management [[Ding et al. 201 1[ Hay et al. 2010[ 


IXiao et al. 20111 DGao et al. 201 01 
Tiedman and Schuster 20101 


Peng et al. 2013[ , 


social network analysis 
[Rastogi et al. 200^ Hay et al. 2009 ISala et al. 201 II and machine learning OBlum et al. 20081 
IChaudhuri et al. 201 11 IRubinstein et al. 20121 . The main idea of differential privacy is to inject 
random noise into aggregate query results, such that the adversary cannot infer, with high confi¬ 
dence, the presence or absence of any given record r in the dataset, even if the adversary knows all 
other records in the dataset besides r. The adversary’s maximum confidence in inferring private 
information is controlled by a user-specified parameter e, called the privacy budget. Given e, the 
main goal of query processing under differential privacy is to maximize the utility/accuracy of the 
(noisy) query answers, while satisfying the above privacy requirements. 

This work focuses on a common class of queries called linear counting queries, which is the 
basic operation in many statistical analyses. Similar ideas apply to other types of linear queries, 
e.g., linear sums. Figurellja) illustrates an example electronic medical record database, where each 
record corresponds to an individual. Figure [Hb) shows the exact number of HIVh- patients in each 
state, which we refer to as unit counts. A linear counting query in this example can be any linear 
combination of the unit counts. For instance, let xny, xnj, xca, xwa be the patient counts in states 
NY, NJ, CA, and WA respectively; one possible linear counting query is xny +xnj + xca+xwa, 
which computes the total number of HIVh- patients in the four states listed in our example. Another 
example linear counting query is xatv/IQ + xnj/8 + xca/ST, which calculates the weighted 
average of patient counts in states NY, NJ and CA, with weights set according to their respective 
population sizes. In general, we are given a database with n unit counts, and a batch QS of m linear 
counting queries. The goal is to answer all queries in QS under differential privacy, and maximize 
the expected overall accuracy of the queries. 
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(a) Patient records (b) Statistics on HIVh- patients 


Fig. 1. Example medical record database 


Straightforward approaches to answering a batch of linear counting queries usually lead to sub- 
optimal result accuracy. Consider processing the query set Q = {qi, q 2 , qs} under the e-differential 
privacy definition, detailed in Section 3. One naive solution, referred to as noise on result (NOR), 
is to process each query independently, e.g., using the Laplace mechanism ODwork et al. 2006cll . 
This method fails to exploit the correlations between different queries. Consider a batch of three 
different queries qi = xny +xnj + xca + xwA, 92 = xny + xnj, 93 = xca + xwA- Clearly, 
the three queries are correlated since 91 = 92 + 93- Thus, an alternative strategy for answering 
these queries is to process only 52 and (73, and use their sum to answer qi. As will be explained 
in Section 3, the amount of noise added to query results depends upon the sensitivity of the query 
set, which is defined as the maximum possible total change in query results caused by adding or 
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removing a single record in the original database. Under e-differential privacy, the sensitivity of the 
query set {( 72 , 93 } is 1, because adding/removing a patient record in FigurelHa) affects at most one 
of (72 and (73 (i.e., 52 if the record is associated with state NY or NJ, and (73 if the state is CA or 
WA), by exactly 1. On the other hand, the query set {qi, < 72 , < 73 } has a sensitivity of 2 (under the e- 
differential privacy definition), since a record in the above 4 states affects both qi and one of <72 and 
( 73 . According to the Laplace mechanism, the variance of the added noise to each query is 2A^/e^, 
where A is the sensitivity of the query set, and e is the user-specified privacy budget. Therefore, 
processing {gi, 92 , < 73 } directly incurs a noise variance of (2 x 2 ^)/e^ for each query; on the other 
hand, executing {q 2 ,q 3 } leads to a noise variance of (2 x for each of 52 and < 73 , and their 

sum <71 = (72 + 93 has a noise variance of (2 x 2)/e^ = 4/e^. Clearly, the latter method obtains 
higher accuracy for all queries. 

Another simple solution, referred to as noise on data {NOD), is to process each unit count under 
differential privacy, and combine them to answer the given linear counting queries. Continuing the 
example, this method computes the noisy counts for Xny, xnj, xca and xwA, and uses their 
linear combinations to answer qi, ( 72 , and < 73 . This approach overlooks the correlations between 
different unit counts. In our example, xmy and xnj (and similarly, xca and xwa) are either both 
present or both absent in every query, and, thus, can be seen as a single entity. Processing them as 
independent queries incurs unnecessary accuracy costs when re-combining them. In the example, 
NOD adds noise with variance 2/e^ to each unit count, and their combinations to answer 91 , 92 , and 
93 have noise variance 8 /e^, 4/e^ and 4/e^, respectively. NOD’s result utility is also worse than the 
above-mentioned strategy of processing 92 and 93 , and adding their results to answer 91 . 

In general, the query set Q may exhibit complex correlations among different queries and among 
different unit counts. As a consequence, it is non-trivial to obtain the best strategy to answer Q 
under differential privacy. For instance, consider the following query set: 

91 = ‘^xnj + Xca + Xwa 

92 = Xnj + ‘^XwA 

93 = Xny + ^xcA + 2xwa 

NOR is clearly a poor choice, since it incurs a sensitivity of 5 under the e-differential privacy 
definition (e.g., a record of state WA affects 91 by 1, and 92 and 93 by 2 each). The sensitivity 
of NOD remains 1, and it answers 91 , 92 , and 93 with noise variance 2 x (2^ -L -L P)/e^, 
2x (l^-|-2^)/e^ and 2x (l^-|-2^-|-2^)/e^ respectively, leading to a sum-square error (SSE) of 40/e^. 
The optimal strategy in terms of SSE in this case computes the noisy results of 9 ^ = x ny /8 -L xwa, 
92 = —3xny/8 — XCA and 93 = xny — xnj- Then, it obtains the results for 91 , 92 , and 93 as 
follows. 


91 = 9i - 92 - 293 

92 = 2q[ - 93 

93 = 291 - 292 

The sensitivity of the above method is also 1, because (i) adding/removing a record of state NJ, 
CA and WA can only affect queries 93 , 92 and 93 , respectively, by at most 1; (ii) adding/removing a 
record of state NY causes the results of 93 , 92 and 93 to change by at most 1/8, 3/8, and 1/4, respec¬ 
tively, leading to a maximum total change of 1/8 h-3/8h- 1/4=1. We introduce the formal definition of 
sensitivity later in Section[3 Hence, independent random noise of variance 2 x 1^/e^ = 2/e^ is 
injected to the results of each of q'^, 92 and 93 . Their combination 91 = 9 j — 92 — 2q'^ thus has a noise 
variance of2x (P-L (-1)^-1- (—2)^)/e^ = 12/e^. Similarly, combining q'^ — 93 to answer 92 and 93 
as above incur a noise variance of 2 x ( 2 ^ + (—= 10 /e^ and 2 x ( 2 ^ -L (— 2 )^)/e^ = 16/e^ 
respectively. The SSE for queries 91 — 93 is thus 12/e^ + 10/e^ -L 16/e^ = 38/e^. 

Observe that the there is no simple pattern in the query set or the optimal strategy. Since there is 
an infinite space of possible strategies, searching for the best one is a challenging problem. 


ACM Transactions on Database Systems, Vol. V, No. N, Article A, Publication date: January YYYY. 


A:4 


G. Yuan et al. 


Li et al. BLi et al. 20101 first formalize the above observations (i.e., answering a correlated query 
set with an effective strategy) into the matrix mechanism. However, as we explain in Section 
12.21 the original matrix mechanism lacks a practical implementation, because the solutions in 
BLi et al. 20101 for hnding a good strategy are either inefficient (which incur prohibitively high com¬ 
putational costs for even moderately large domains), or ineffective (which rarely obtain strategies 
that outperform naive methods NOD/NOR). Later, Li and Miklau BLi and Miklau 2012^ propose 
the adaptive mechanism, which can be seen as an implementation of the matrix mechanism. This 
method, however, still incurs some drawbacks as discussed in Section lZ2l which limit its accuracy. 
Motivated by this, we propose the hrst practical realization of the matrix mechanism, called the 
low-rank mechanism {LRM), based on the theory of low-rank matrix approximation. LRM applies 
to both e-differential privacy and (e, 5)-differential privacy, two most commonly used differential 
privacy dehnitions today. We analyze the utility of LRM under (^, r 7 )-usefulness BBlum et al. 2008B . 
a popular utility measure. Extensive experiments demonstrate that LRM signihcantly outperforms 
existing solutions in terms of result accuracy, sometimes by orders of magnitude. 

The rest of the paper is organized as follows. Section [previews previous studies on differential 
privacy. Section [3 provides formal dehnitions for our problem. Section 0] presents the mechanism 
formulation of LRM under e-differential privacy. Section|5]discusses how to solve the optimization 
problem in LRM. Section|6]extends LRM to answer queries under (e, ^)-differential privacy. Section 
|7]verihes the superiority of our proposal through an extensive experimental study. Finally, Section 
[8] concludes the paper. 

2. RELATED WORK 

Section 12.11 surveys general-purpose mechanisms for enforcing differential privacy. Section 12.21 
presents two methods that are closely related to the proposed solution, namely the matrix mech¬ 
anism and the adaptive mechanism. 


2.1. Differential Privacy Mechanisms 

Differential privacy was hrst formally presented in BDwork et al. 2006c1 . though some previous 
studies have informally used similar models, e.g., BDinur and Nissim 2003B . The Laplace mecha¬ 
nism BDwork et al. 2006cl is the hrst generic mechanism for enforcing differential privacy, which 
works when the output domain is a multi-dimensional Euclidean space. McSherry and Talwar 
I McSherry and Talwar 2007) propose the exponential mechanism, which applies to any problem 
with a measurable output space. The generality of the exponential mechanism makes it an impor¬ 
tant tool in the design of many other differentially private algorithms, e.g., BCormode et al. 207^ 
IXu et al. 2012llXu et al. 20T3l [McSherry and Talwar 2007[ . 

The original dehnition of differential privacy is e-differential privacy, which focuses on provid¬ 
ing a strong and rigorous dehnition of privacy. Besides this, another popular dehnition is (e, S)- 
differential privacy, which can be seen as an approximate version of e-differential privacy. In many 
applications, (e, (5)-differential privacy provides a similarly strong privacy dehnition, while enabling 
simpler and/or more accurate algorithms. One basic mechanism for enforcing (e, (5)-differential pri¬ 
vacy is the Gaussian mechanism, which injects Gaussian noise to the query results calibrated to 
the £2 sensitivity of the queries BDwork et al. 2006all . BHardt and Roth 20121 employ k Gaussian 
measurements strategy to compute the low rank approximations of large matrices. However, (e, 
(5)-differential privacy might be unsatisfactory in certain situations. For example, BDe 2012B demon¬ 
strate that (e, (5)-differential privacy is weaker than e-differential privacy in terms of mutual infor¬ 
mation even when S is negligible. The proposed solution applies to both dehnitions of differential 
privacy. We present details of these two privacy dehnitions in Section[3 

Linear query processing is of particular interest in both the theory and database communities, 
due to its wide range of applications. To minimize the error of linear queries under differential 
privacy requirements, several methods try to build a synopsis of the original database, such as 
Fourier transformations | Rastogi and Nath 2010 , wavelets BXiao et al. 20101 and hierarchical trees 
[Hay et al. 2010[. The compressive mechanism 


Li et al. 201 IB reduces the amount of noise neces- 
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sary to satisfy differential privacy, for datasets with a sparse representation. By publishing a noisy 
synopsis under e-differential privacy, these methods are capable of answering an arbitrary number 
of linear queries. However, most of these methods obtain good accuracy only when the query selec¬ 
tion criterion is a continuous range; meanwhile, since these methods are not workload-aware, their 
performance for a specific workload tends to be sub-optimal. 

Workload-aware algorithms address this problem, which optimize the overall accuracy of a set 
of given linear queries. This work falls into this category. Notable workload-aware methods include 
(i) Multiplicative Weights / Exponential Mechanism (MWEM) OHardt et al. 2012L (iil the Matrix 
Mechanism OLi et al. 20101 and (iii) the Adaptive Mechanism OLi and Miklau 20T2ll . MWEM pub¬ 
lishes a synthetic dataset optimized towards the given linear query set. In particular, it provides a 
beautiful theoretical bound on the maximum error of the given queries, which grows sublinearly to 
the number of records in the dataset, and logarithmically with the number of queries. In practice, 
however, this bound tends to be loose as it is derived from worst-case scenarios. Meanwhile, the 
target problem of MWEM is different from ours, as we focus on answering a given set of linear 
queries rather than publishing synthetic data. Nevertheless, MWEM can be applied to our problem, 
and we compare it against the proposed solution in the experiments. The Matrix Mechanism and the 
Adaptive Mechanism share some common features as the proposed solution, and we explain them 
in detail in Section [2^ 2.2. It is worth mentioning that as our experiments shows, the proposed 
solution outperforms all previous methods in terms of overall error, on a variety of datasets and 
workload types. 

Recently, BNikolov et al. 20131 proposes a workload decomposition method that injects corre¬ 
lated Gaussian noise to the query results to satisfy (e, (5)-differential privacy. They prove that their 
solution provides an (!I((log m)^) approximation to the optimal mechanism, where m is the number 
of queries. However, this method is infeasible in practice, since it involves computing minimum 
enclosing ellipsoids (MEE), for which the current best algorithm takes time, where n is 

the number of unit counts. BNikolov et al. 20131 suggests using approximation method for comput¬ 
ing MEE, e.g. Khachiyan’s algorithm BTodd and Yildirim 20071 . This approximation algorithm still 
takes high order polynomial time to converge, which makes it prohibitively expensive for practical 
applications. 

Several theoretical studies have derived lower bounds for the noise level for processing linear 
queries under differential privacy BDinur and Nissim 20031 iHardt and Talwar 201 OB . Notably, Dinur 
and Nissim BDinur and Nissim 2003B prove that any perturbation mechanism with maximal noise of 
scale 0{n) cannot possibly preserve personal privacy, if the adversary is allowed to ask all possible 
linear queries, and has exponential computation capacity. By reducing the computation capacity of 
the adversary to polynomial-bounded Turing machines, they show that an error scale n{y/n) is nec¬ 
essary to protect any individual’ privacy. More recently, Hardt and Talwar BHardt and Talwar 20101 
have significantly tightened the error lower bound for answering a batch of linear queries under 
differential privacy. Given a batch of m linear queries, they prove that any e-differential privacy 
mechanism leads to squared error of at least H(e“^m^Vol(IE)), where Vol(tE) is the volume of 
the convex body obtained by transforming the £i-unit ball into m-dimensional space using the 
linear transformations in the workload W. This paper extends their analysis to low-rank workload 
matrices. 

Another related line of research concerns answering queries interactively under differential pri¬ 
vacy. In this setting, the system process queries one at a time, without knowing any future query. 
Clearly, this problem is more difficult that the non-interactive setting described so far, where the 
system knows all queries in the workload in advance. Most notably, Hardt et al. propose the Private 
Multiplicative Weights Mechanism (PMWM) BHardt and Rothblum 20101 . whose error is asymp¬ 
totically optimal with respect to the number of queries answered. The MWEM method described 
above BHardt et al. 20121 applies similar ideas to the non-interactive setting. Besides PMWM, 
Hardt et al. BHardt and Talwar 20101 propose the iT-norm Mechanism whose error level almost 
reaches the lower bound derived in the same paper. Roth et al. introduce the Median Mechanism 
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I Roth and Roughgarden 20 fO) for answering arbitrary queries interactively. However, both the K- 
norm Mechanism and the Median Mechanism rely on uniform sampling in a high-dimensional con¬ 
vex body [ Dyer et al. 19911 , which theoretically takes polynomial time, but is usually too expensive 
to be applied in practice. 

Besides linear queries, differential privacy is also applicable to more complex queries in various 
research areas, due to its strong privacy guarantee. In the field of data mining, Friedman and Schus¬ 
ter QFriedman and Schuster 20101 propose the first algorithm for building a decision tree under dif¬ 
ferential privacy. Mohammed et al. OMohammed et al. 201 II study the same problem, and propose 
an improved solution based on a generalization strategy coupled with the exponential mechanism. 
Ding et al. [ Ding et al. 201 1| investigate the problem of differentially private data cube publication. 
They present a randomized materialized view selection algorithm, which reduces the overall error, 
and preserves data consistency. 

In the database literature, a plethora of methods have been proposed to optimize the accuracy of 
differentially private query processing. A tutorial on database-related differential privacy technolo¬ 
gies can be found in | |Yang et al. 2012) . Cormode et al. BCormode et al. 20T2\ investigate the prob¬ 
lem of multi-dimensional indexing under differential privacy, with the novel idea of assigning differ¬ 
ent amounts of privacy budget to different levels of the index. Peng et al. | Peng et al. 2012| propose 
the DP-tree, which obtains improved accurate for higher dimensional data. Xu et al. IXu et al. 201^ 
IXu et al. 20131 optimize the procedure of building a differentially private histogram, whose method 
combines dynamic programming for optimal histogram computation and the exponential mecha¬ 
nism. OLi et al. 20121 study the problem of how to perform frequent itemset mining on transaction 
databases while satisfying differential privacy, with the novel approach of constructing a basis set 
and then using it to find the most frequent patterns. 

In addition, differential privacy for modeling security in social networks has also received much 
attention in recent literature. | Rastogi et al. 2009) considers answering subgraph counting queries 
in a social network. Their solution assumes a Bayesian adversary whose prior is drawn from a dis¬ 
tribution. They compute a high probability upper bound on the local sensitivity of the data and then 
answer by adding noise proportional to that bound. ( Hay et al. 2009| shows how to privately approx¬ 
imate the degree distribution in the edge adjacency model of a graph. Also, OSala et al. 201 111 de¬ 
velop a differentially private graph model based on dk-series reconstruction. Their approach mainly 
extracts a graph’s detailed structure into degree correlation statistics and inject noise into the result¬ 
ing dataset and generates a synthetic graph. 

Lastly, differential privacy is also becoming a hot topic in the machine learning commu¬ 
nity, especially for learning tasks involving sensitive information, e.g., medical records. In 
OChaudhuri et al. 2011 L Chaudhuri et al. propose a generic differentially private learning algorithm, 
which requires strong convexity of the objective function. Rubinstein et al. ORubinstein et al. 20121 
study the problem of S VM learning on sensitive data, and propose an algorithm to perturb the kernel 
matrix with performance guarantees, when the gradient of the loss function satisfies the Lipschitz 
continuity property. Zhang et al. propose functional mechanism and for a large class of optimization- 
based analyses | Zhang et al. 2012) . Later, they propose the PrivGene framework, which combines 
genetic algorithms and an enhanced version of exponential mechanism for differentially private 
model fitting | Zhang et al. 2013) . General differential privacy techniques have also been applied 
to real systems, such as network trace analysis | McSherry and Mahajan 2010) and private recom- 
mender systems [McSherry and Mironov 2009). 


2.2. Matrix Mechanism and Adaptive Mechanism 

In the seminal work of BLi et al. 20101 . Li et al. propose the matrix mechanism (MM), which for¬ 
malizes the intuition that a batch of correlated linear queries can be answered more accurately under 
e-differential privacy, by processing a different set of queries (called the strategy) and combining 
their results. Specifically, given a workload of linear counting queries, MM first constructs a work¬ 
load matrix W of size mxn, where m is the number of queries, and n is the number of unit counts. 
The construction of the workload matrix is elaborated further in Section 3. After that, MM searches 
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for a strategy matrix A of size rxn, where r is a positive integer. Intuitively, A corresponds to an¬ 
other set of linear queries, such that every query in W can be expressed as a linear combination of 
the queries in A. The matrix mechanism then answers the queries in A under e-differential privacy, 
and subsequently uses their noisy results to answer queries in W. 

The main challenge for applying the matrix mechanism to practical workloads is to identify an 
appropriate strategy matrix A. Ref. OLi et al. 20101 provides two algorithms for this purpose. The 
first, based on iteratively solving a pair of related semidefinite programs, incurs 0{m^n^) computa¬ 
tional overhead, which is prohibitively expensive even for moderately large values of m and n. The 
second solution (called approximate matrix mechanism (AMM)) computes an £2 approximation of 
the optimal strategy matrix A. This method, though faster than the first one, still requires high CPU 
costs and memory consumption, and scales poorly with the domain size and query set cardinality. 
In order to test the approximate matrix mechanism with large data and query sets in our experi¬ 
ments, we have devised an improved solution, which we call the exponential smoothing mechanism 
(ESM), based on the problem formulation of approximate matrix mechanism in OLi et al. 20101 . 
ESM is at least as accurate as the method in ULi et al. 20101 . and yet much more efficient. Hence, in 
our experiments we use ESM in place of AMM. Appendix lA. 1 [ provides details of ESM. 

There are, however, two main drawbacks of ESM (and also vanilla AMM). Eirst, the £2 approx¬ 
imation of the optimal strategy matrix often has poor quality. In fact, due to this problem, in our 
experiments we found that under e-differential privacy, the accuracy of ESM is often no better than 
the naive solution NOD that injects noise directly into the unit counts. A second and more subtle 
problem is that the formulation of the optimization program in AMM involves matrix inverse oper¬ 
ators, which can cause numerical instability when the final solution (i.e., the strategy matrix) is of 
low rank, as explained in Appendix IA. 1 1 The proposed low-rank mechanism avoids both problems, 
and achieves significantly higher result accuracy as shown in our experiments. 

The idea of matrix mechanism naturally extends to (e, (5)-differential privacy, using the Gaussian 
mechanism instead of the Laplace mechanism as the fundamental building block. In this case, the 
optimization program is defined using £2 form, and the AMM formulation is equivalent to that of 
MM, meaning that AMM and ESM now solve the exact optimization program. Hence, in theory, 
AMM can obtain optimal results. However, in practice, both ESM and the AMM implementation 
in ULi et al. 20101 often fail to converge to the optimal strategy matrix, due to numerical instability 
incurred by the matrix inverse operator in the AMM formulation. 

Recently, OLi and Miklau 20T2ll Li et al. propose another implementation of AMM, called the 
adaptive mechanism (AM). Eor any given workload W, AM attempts to find the best strategy matrix 
by computing the optimal nonnegative weights for the eigenvectors of the workload matrix W. Since 
the strategy matrix may have one or more columns whose £ 2 -norm are less than the sensitivity, they 
refine the strategy matrix by appending some completing columns to the candidate strategy matrix 
without raising the sensitivity. Therefore, this post-processing step can reduce the expected error. 
AM incurs two serious drawbacks. Eirst, it involves solving a complicated semidefinite program, 
and it is not known whether their solution to the program converges to the optimal solution. Second 
and more importantly, such multistep strategy in AM does not offer any guarantee on optimality. The 
proposed method LRM is free from these problems, and obtains significantly better performance as 
we show in the experiments. Appendix IA.2| provides details of AM. 

3. PRELIMINARIES 

We focus on answering a batch of linear counting queries Q = {qi,q 2 , ■ • ■, qm} over a sensitive 
database D. Each query qi G Q is a linear combination of unit counts in the data domain, denoted as 
Xi, X 2 , ■ ■ ■, Xn- In the example of Eigure[T] the sensitive database D contains records correspond¬ 
ing to individual HIVh- patients; each unit count is the number of such patients in a state of the 
US; each query in the example is a linear combination of these state-level patient counts. Our goal 
is to answer Q with minimum overall error, while satisfying differential privacy. In particular, we 
consider two definitions of differential privacy, namely e-differential privacy (i.e., the original def¬ 
inition of differential privacy) and (e,i5)-differential privacy (a popular formulation of approximate 
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Table I. Summary of frequent notations 


Symbol 

Meaning 

D 

input database 

n 

number of unit counts 

Q 

input query set 

m 

number of queries in Q 

w 

workload matrix, i.e., the matrix representation of Q 

B,L 

a decomposition of W satisfying W ^ B ■ L 

s 

rank of workload matrix W 

r 

number of columns in B (also number of rows in L) 

Q(D) 

exact answer of Q on database D 

A(0) 

Cl sensitivity of Q 

e((3) 

£2 sensitivity of Q 

e,S 

privacy parameters 


utility parameters 

k{W) 

generalized condition number of matrix W 

P{W) 

p-coherence of matrix W 

lll^llli 

maximum absolute column sum of matrix X 

111^1112 

spectral norni, maximum singular value of matrix X 

lll^llloo 

maximum absolute row sum of matrix X 


nuclear norm, sum of the singular values of matrix X 

IIATIIf 

Frobenius norm, square root of the sum of squared elements of matrix X 


differential privacy). Our solutions use the Laplace mechanism (resp., the Gaussian mechanism) as a 
fundamental building block to enforce e- (resp., (e, 6)-) differential privacy. In the following. Section 
3.1 presents the definition of e-differential privacy and the Laplace mechanism. Section 3.2 covers 
(e, (5)-differential privacy and the Gaussian mechanism. Section 3.3 describes naive approaches to 
answering a batch of linear counting queries. Section 3.4 explains important properties of low-rank 
matrices that are used in our solutions. TableUsummarizes frequently used notations throughout the 
paper. 

3.1. 6-Differential Privacy and the Laplace Mechanism 

The basic idea behind the privacy guarantee of differential privacy is the indistinguishability be¬ 
tween neighbor databases. Two databases D and D' are called neighbor databases, iff. D' can be 
obtained by adding or removing exactly one record from D. In the example of Figure[T] a neighbor 
database can be obtained by removing an individual from the original data, or by adding another 
one. For linear counting queries, the essential difference between two neighbor databases D and D' 
is that they differ on exactly one unit count, by exactly one. Formally, let {xi,X 2 ^. ■., Xn} be the 
set of unit counts corresponding to D and {x[,X 2 ,..., be the unit counts for D'. Then, there 
exists an z, 1 < * < n, such that xj = x' for all j ^ i, and jxi — x'| = 1. 

Given a set of queries Q, a randomized mechanism M for answering Q satisfies e-differential 
privacy, iff. for every possible pair of neighbor databases D and D', the following inequality holds: 


Vi? : Pr(M(Q, D) = R) < e" Pr(M(Q, D') = R) (1) 

where R is any possible output of M, and M{Q, D) (resp. M{Q, D')) is the output of M given 
query set Q and input database D (resp., D'). This inequality indicates that given an output R of M, 
the adversary can only have limited confidence for inferring whether the input database is D or D', 
regardless of his/her background knowledge. Since D and D' can be any two neighbor databases 
that differ in any record, the above inequality also limits the adversary’s confidence for inferring the 
presence or absence of a record in the input database; hence, it provides plausible deniablity to any 
individual involved in the sensitive data. 

The Laplace mechanism ODwork et al. 2006cl is a fundamental solution for enforcing e- 
differential privacy, based on the concept of Ci sensitivity. Given a query set Q, its £i sensitivity 
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A((5) is the maximum L\ distance between the exact results of Q on any pair of neighbor databases 
D and D', Formally, we have: 


A(Q) = migc||Q(iD),Qp')lli (2) 

Note that in the above equation, D and D' can be any pair of neighbor databases. Hence, A(Q) is 
a property of the query set Q and the data domain, and it does not depend upon the actual sensitive 
data D. In the example of Figure [T] the Ci sensitivity of a single query qi = xny + xnj + xca + 
xwA is 1, because any two neighbor databases D and D' differ on only one unit count (which can be 
one of xjvK, Xnj, xca or a:wyi)by exactly 1. If we include q 2 = xpiy + x^j and q^ = xca+xwa 
to the query set Q, the Ci sensitivity of Q = {< 71 , 92 , 93 } is 2, because a change of 1 on any of xny, 
xnj, Xca or xwa affects the result of 91 by 1 , and either one (but not both) of 92 and 93 by 1 , 
leading to a £1 distance of 2 . 

Given a database D and a query set Q, the Laplace mechanism (denoted as M^ap) outputs a 
randomized result set R that follow the Laplace distribution with mean Q{D) and scale i.e.. 


Pr(Mi,p(Q, D) = R)^ exp (^-^\\R- Q(Z2)||i^ (3) 

This is equivalent to adding independent Laplace noise to the exact result of each query in Q, i.e., 
M{Q, D) = Q{D) + Lap , where m is the number of queries in Q, and Lap ^ 

a random variable following zero-mean Laplace distribution with scale A = The probability 

density function of zero-mean Laplace distribution is: 

^(")=2a“p(-— j 


According to properties of the Laplace distribution, the variance of Lap{\) is 2A^ = ^ ^2 ■ 

Since the Laplace noise injected to each of the m query results is independent, the overall expected 
squared error of the query answers obtained by the Laplace mechanism is in our running 

example in Figure [H to answer the query set Q = {91 = xny + xnj + xca + xwA,q 2 = 
Xny + Xnj, 93 = xca + xwa} under e-differential privacy, a direct application of the Laplace 
mechanism injects independent, zero-mean Laplace noise of scale | to the exact result of each of 
9 i, 92 and 93 , since the Ci sensitivity for this set of queries is 2, as discussed in Section [T] The 
overall squared error for Q is thus = 2 ^ 


3.2. (e, (i)-Differential Privacy and the Gaussian Mechanism 

e-differential privacy can be difficult to enforce, especially for queries with high £1 sensitivity, or 
those whose £1 sensitivity is difficult to analyze. Hence, relaxed versions of e-differential privacy 
have been studied in the past, among which a popular definition is the (e, i5)-differential privacy, also 
called approximate differential privacy. This definition involves an additional parameter 5, which is 
a non-negative real number controlling how closely this definition approximates e-differential pri¬ 
vacy. Formally, let Range{M) be the set of all possible outputs of a mechanism M. A randomized 
mechanism M satisfies (e, i5)-differential privacy, iff. for any two neighbor databases D and D', the 
following holds: 


VR C Range{M) : Pr(M(Q, D) G R) < Pr(M(Q, D') G R) -L <5 (5) 

where R is any set of possible results of M. It can be derived that when <5 = 0, (e, (5)-differential 
privacy is equivalent to e-differential privacy. Accordingly, since <5 is non-negative, any mechanism 
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that satisfies e-differential privacy also satisfies (e, i5)-differential privacy for any value of 5. When 
(5 > 0, (e, (5)-differential privacy relaxes e-differential privacy by ignoring outputs of M with very 
small probability (controlled by parameter S). In other words, an (e, b)-differentially private mech¬ 
anism satisfies e-differential privacy with a probability controlled by S. 

A basic mechanism for enforcing (e, (5)-differential privacy is the Gaussian mechanism 
ODwork et al. 2006bl . which involves the concept of C 2 sensitivity. For any two neighbor databases 
D and D', the C 2 sensitivity 0((5) of a query set Q is defined as: 

0(Q) = max|ig(i:)),Q(D')||2 (6) 

In the running example shown in Figure [T] the £2 sensitivity for the query set Q = {gi = Xny + 
XNj+xcA+xwATq 2 = xny+xnj, 93 = xca+xwa} is v^, since the exact results of 91 (as well 
as one of 92 and 93 ) differ by at most 1 for any two neighbor databases, leading to an £2 sensitivity 
of -b = sf2. Similar to £1 sensitivity, the £2 sensitivity 0(g) depends on the data domain 
ID and the query set g, not the actual data. Given a database D and a query set Q, the Gaussian 
mechanism (denoted by Mcau) outputs a random result that follows the Gaussian distribution with 
mean Q{D) and magnitude a = where h{e,5) = --j====. This is equivalent to adding 

m-dimensional independent Gaussian noise Gau ■> in which Gau is a random 

variable following a zero-mean Gaussian distribution with scale a = The probability density 

function of zero-mean Gaussian distribution is: 


9{x) 



(7) 


According to properties of the Gaussian distribution, the variance of Gau{a) is 
Since independent Gaussian noise is injected to each of the m query results, the total expected 
squared error for the query set is ■ In our running example in Figure [T] to answer the query 


set Q = {q^ = Xny + xnj + xca + xwA, 92 = xny + xnj, 93 = xca + xwa} under (e, S)- 
differential privacy, a direct application of the Gaussian mechanism injects independent, zero-mean 

Laplace noise of scale to the exact result of each of 91 , 92 and 93 , since the £2 sensitivity 


for this set of queries is sf2, according to Equation (|6]l. The overall squared error for Q is thus 

3x{V2f _ 481n(2/5) 

(FM)F “ ^ ■ 


3.3. Naive Solutions for Answering a Batch of Linear Counting Queries 

This paper focuses on answering a batch of linear counting queries, each of which is a lin¬ 
ear combination of the unit counts of the input database D. Formally, given a weight vector 
(wi, W 2 , • ■., Wri)^ S M", a linear counting query can be expressed as: 


q{D) = WlXl + W2X2 -I- . . . -f WnXn 

We aim to answer a batch of m linear queries, Q = { 91 , 92 , ■■■, 9m}. The query set Q thus can 
be represented by a workload matrix W with m rows and n columns. Each entry Wij in W is the 
weight in query qi on the j-th unit count xj. Since we do not use any other information of the input 
database D besides the unit counts, in the following we abuse the notation by using D to represent 
the vector of unit counts, i.e., D = {xi,X 2 , ■ ■ ■ ,x„)'^ € R”. Hence, the query batch Q can be 
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answered by: 


Q{D) = WD = w„ 


jmx 1 


Two naive solutions for enforcing differential privacy on a query batch are as follows. 

Noise on data (NOD). The main idea of NOD is to add noise to each unit count. Then, the set of 
noisy unit counts are published, which can be used to answer any linear counting query. Because 
two neighbor databases differ on exactly one unit count, by exactly 1, both the £i and the £2 
sensitivity for the set of unit counts is 1, according to their respective definitions. NOD employs 
the Laplace mechanism to enforce e-differential privacy (or the Gaussian mechanism to enforce (e, 
(5)-differential privacy) on the published unit counts, and then combines the noisy unit counts to 
answer the query batch Q. Let Mmod,^ and Mj^oD,{e,s) denote the NOD mechanism for enforcing 
e-differential privacy and (e, ^)-differential privacy, respectively. We have: 


MnodAQ,D) = W 


+ Lap 



Mnod,{>^,s){q,d) = W + Gau 
where h(e,S) = , as in the Gaussian mechanism. 

^ ^ V81n(2/5) 

Based on the analysis of the Laplace and Gaussian mechanisms, the expected squared error for 
MNOD,e and MMOD,(e,s) is ^ Y.i,j and ’ respectively. For both privacy def¬ 

initions, the error of NOD is proportional to the squared sum of the entries in W. 

Noise on results (NOR). NOR simply applies the Laplace mechanism (for e-differential privacy) 
or the Gaussian mechanism (for (e, 5)-differential privacy) directly on the query set Q. Recall that 
each query gi € Q is a linear combination of the unit counts, i.e., qi = - Meanwhile, two 

neighbor databases differ on exactly one unit count, by exactly 1. Therefore, the sensitivity (both £1 
and £ 2 ) of qi is maxj Wij, i.e., the maximum unit count weight in qi. Regarding Q, its £1 sensitivity 
is A((5) = maxj i-®-’ the highest column absolute sum OLi et al. 20101 . Similarly, its £2 

sensitivity is Q{Q) = maxj he., the highest column £2 norm value OLi et al. 2010ll . 

Thus, Mf^oR,e and Mmor,L,s) output the following results. 



MnorAQ’ h?) = WD + Lap ^ 

MNOR,ie,s)iQ,D) = WD + Gau 

where A(Q) = maxj X), \Wij\, 0(g) = max^ /i(e,5) = -j=^=. 

Similar to the analysis of the Laplace and the Gaussian mechanisms, the expected squared er¬ 
ror of the MjqoR,t on query Q is _ 2 m ^ expected squared error of 

MNOR,{e,s) is ™ interesting observation is that under (e, (5)-differential 

privacy, NOR obtains lower expected squared error than NOD, iff. m maxj ^ < 

Note that when m > n, this inequality can never hold, implying that NOR is more effective for 
when the number of queries m is smaller than the number of unit counts n. 
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3.4. Low-Rank Matrices and Matrix Norms 

The rank of a real-value matrix W is the number of non-zero singular values obtained by perform¬ 
ing singular value decomposition (S VD) of W. SVD decomposes W of size mxn into the product 
of three matrices: W = UYjV . U and V are row-wise and column-wise orthogonal matrices respec¬ 
tively, and E is a diagonal matrix with positive real diagonal values, which are the singular values 
of W. Let s be the number of such singular values, i.e., the rank of W. Then, Matrices U, E, and V 
are of sizes m x s, s x s, and s x n respectively. SVD guarantees that s < min{m, n}. 

A matrix W of size mxn whose rank is less than min{m, n} is called a low-rank matrix. This 
happens when the rows and columns of W are correlated. In the running example of Figure [T] the 
workload matrix corresponding to the query set Q = {gi = xny + xnj + xcA + xwA, <12 — 
xny + <73 = xcA + xwa} is a low-rank matrix, since the queries in Q are correlated (i.e., 

9 i = <72 + < 73 , and the unit counts are also correlated (e.g., xny and xnj)- The main idea of the 
proposed low-rank mechanism is to exploit the low-rank property of the workload matrix to reduce 
the necessary amount of noise required to satisfy differential privacy. 

An important concept used in the proposed solution is the matrix norm, which is an extension 
of the notion of vector norms to matrices. Two common definitions of the matrix norm are: (i) 
Entrywise norm, which treats a matrix W of size mxn simply as a vector of size mxn consisting 
of all entries of W, and applies one of the vector norm definitions. For example, applying the 
£ 2 -norin to all entries in W obtains ||VF ||2 = which is also called the 

Frobenius norm, written as ||kF||F- (ii) Induced norm (or Operator norm), defined by |||IL|||p = 
maxaj^o IIwhere a; is a vector of size n, and ||a:||p is the Cp norm of x. Notably, 
111 kF 111 1 is simply the maximum absolute column sum of W, and 111 kF 111 oo is simply the maximum 
absolute row sum of the matrix kF. 

4. WORKLOAD DECOMPOSITION 

Recall that the example in Figure [T] shows that sometimes it is best to answer a batch of linear 
counting queries Q indirectly, by first answering a set of intermediate linear counting queries under 
differential privacy, and combine their results to answer Q. The proposed low-rank mechanism 
(LRM) follows this idea. Specifically, given a workload matrix kF corresponding to the query set 
Q, LRM decomposes kF into the product of two matrices kF = BL. B is of size m x r and L is 
of size r X n. Here, r is a parameter to be determined which specifies the number of intermediate 
queries; L corresponds to the set of intermediate linear counting queries to answer under differential 
privacy; B indicates how the results of these intermediate queries are combined to answer Q. The 
main challenge lies in how to choose the best decomposition that minimizes the overall error of Q, 
as there is a vast search space for possible decompositions. In this section, we model the search 
for the optimal matrix decomposition as a constrained optimization program, which is solved in 
the next section. For the ease of presentation, we focus on e-differential privacy in this and the next 
section, and defer the discussion of (e, (5)-differential privacy until Section|6] In addition, we provide 
asymptotic error bounds for LRM in Appendix iBl 

In the following. Section ItTI formalize LRM and the optimization program of workload decom¬ 
position. Section l4~2l analyzes the result utility of LRM with the optimal workload decomposition, 
and discusses the selection of the privacy parameter e. Finally, Section 1431 presents a relaxed opti¬ 
mization program for workload decomposition which can further improve the accuracy of LRM for 
certain workloads. 

4.1. Optimization Program Formulation 

We first formalize LRM under e-differential privacy. Given kF and its decomposition W = BL, 
LRM first applies the Laplace mechanism to the intermediate queries specified by L. Let A(L) 
denote the Ci sensitivity of these intermediate queries. Similar to the case of NOR discussed in 
Section [33l A(L) is the maximum sum of absolute values of a column in L, which is: 
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A(L) = max 




Applying the Laplace mechanism, we obtain the noisy results of the intermediate queries; 


LD + Lap 



r 


where D denotes the vector of unit counts. Next, LRM multiplies matrix B with the noisy 
intermediate results, which essentially recombines the intermediate results to answer Q. Let 
MLf{M,e{Q, D) denote LRM under e-differential privacy, we have: 


MlrmAQ^ D) = B 


^LD + Lap 



(8) 


Since W = BL, we have Q{D) = WD = BLD. Hence, the output MLRM,e{Q, D) can be 
seen as the sum of two components: BLD and B ■ Lap j ■ The former is the exact result of 

Q, and the latter is the noise added in order to satisfy differential privacy. Next we analyze the error 
of LRM. First we define the scale of a decomposition, as follows. 

Definition 4.1. Scale of a workload decomposition. Given a workload decomposition W = 
BL, its scale $(5) is the squared sum of the entries in B, i.e., $(5) = j Bfj. 

Meanwhile, we call A(L) the £i sensitivity of the decomposition W = BL. The following 
lemma shows that the expected squared error of LRM is linear to the scale of the decomposition, 
and quadratic to the £i sensitivity of the decomposition. 

Lemma 4.2. The expected squared error of MRRM.eiQ, D) using decomposition W = BL is 

2$(B)A(L)^ 

£2 

Proof. According to Equation (|8j,Mi/{M,£(Q,f?) — (3(f?) = B- Lap .Theexpected 

squared error of the mechanism is thus Since $(i?) = Bfj, the error can 

be rewritten as ^^(-^)(^(-^)) ^ □ 

Therefore, to find the best workload decomposition, it suffice to solve the optimal B and L that 
minimize $(i?) (A(L))^, while satisfying W = BL. However, this optimization program is dif¬ 
ficult to solve, because (i) the objective function involves the product of $(5) and the square of 
A(L), and (ii) A(L) may not be differentiable. To address this problem, we first prove an important 
property of workload decomposition, which implies that the exact value of A(L) is not important. 


Lemma 4.3. Given a workload decomposition W = BL , we can always construct another 
decomposition W = B'L' satisfying (i) A(L') = 1 and (ii) (B', L') lead to the same expected 
squared error of os (B, L), i.e., 

T>(H)A(L)2 = $(5') (A(L'))^ = ^{B') 

Proof. We obtain B' and L' by B' = A{L)B, L' = -^^L. Based on the definition of Ci 
sensitivity, we have 


A(L') = maxY^ |L', | = max 




E 


L, 


A{L) 


A{L) 


A{L) = 1 
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Meanwhile, according to Definition l4.ll we have: 

ij ij 

This leads to the conclusion of the lemma. □ 

It follows from the above lemma is that there must be an optimal decomposition with Ci sensi¬ 
tivity equal to 1, because we can always apply Lemma 143] to transform an optimal decomposition 
whose Cl sensitivity is not 1 to another optimal decomposition whose Ci sensitivity is 1. Therefore, 
it suffices to fix A(L) to 1 in the optimization program. Meanwhile, according to properties of the 
matrix trace, we have $(i?) = B). Thus, we arrive at the following theorem. 

Theorem 4.4. Given the workload W, a workload decomposition W = BL minimizes the 
expected squared error of the queries, if {B, L) is the optimal solution to the following program: 


min -tr{B'^B) 

B,L 2 

sJ. W = BL 

r 

i 

The constant factor 1/2 in the objective function above simplifies the notations in the following 
sections; it does not affect the optimal solution of the program. We omit the proof since it is already 
clear from the discussions above. Solving the above optimization program is rather difficult, since 
it involves a non-linear objective function and complex constraints. We present a relaxation of the 
problem in Section l43] and our solution in Section|5] 


4.2. Utility Analysis and Budget Selection 

In practice, users are often unsure about how to set the privacy parameter e involved in e-differential 
privacy. Instead, setting the desired utility level of the query results is much more intuitive. Given the 
user-specified utility, this subsection derives the smallest e value for LRM that satisfies the utility 
requirement. Note that smaller values of e corresponds to stronger privacy protection. We use a 
common definition of query result utility called (^, ryj-usefulness BBlum et al. 20081 . as follows. 

Definition 4.5. Given a mechanism M, query set Q, sensitive data D, and parameters ^ > 0 
and 0 < 77 < 1, we say that M is (,f, pj-useful with respect to Q and D under the || • ||*-norm if the 
following inequality holds: 

Pr(||M(Q,i^)-g(D)|U>e)<P 

where || • ||,-norm can be any vector norm definition. In our analysis, we consider the || • || i-norm 
and the || • ||oo-norm. 

Given user specified values of ^ and rj, we now derive the minimum value for e with which LRM 
achieves (^, pj-usefulness. The derivation uses Markov’s inequality and the Chernoff bound, as 
follows. 


Lemma 4.6. Markov’s Inequality and the Chernoff Bound fiBillingsley 2012^ . Given a non¬ 
negative random variable X and t > 0, the following inequality holds: 

Pr(A >t)< ^ 

Moreover, for any s > 0, we have: 

Pr(A >t)= Pr(e"-^ > e*‘) < LLL 
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The minimum e value is given in the following theorem. 

Theorem 4.7. Utility of LRM under e-differential privacy. Given query set Q, database D, 
and user-specified parameters > 0 and 0 < 77 < 1 , (i) with the optimal decomposi¬ 

tion W = BL solved from Program dP]) returns (^, r])-useful results of Q on D under the || • ||i- 
norm, when the privacy parameter e satisfies e > (2|||i?|||i(s • ln2 — Inry)) /^. (ii) Meanwhile, 
MLRM.e with the optimal decomposition achieves (^, ri)-usefulness under the || • \\oo-norm, when 

e > (2|||B|||oo(Ei=i In(OT) - In77)) /£,. 

Proof, (i) We first prove the utility of LRM under the || • || i-norm. Let X be the Laplace noise 
vector injected to the results of intermediate queries corresponding to L. We have; 

\\Mp{Q,D) - Q{D)\\x = \\B{LD + X)-WD\\x 

= \\B-Xh = \\\B-X\\\, < lllBlIli • lllXllli = |||i3|||i • ||X||i 

According to the Laplace mechanism, Xi,X 2 , • • • , X,. are i.i.d. random variables following the 
zero-mean Laplace distribution with scale A{L)/e. Since L is obtained by solving Program (|9]l, 
we have A(L) = 1. Therefore, the scale of each of the Laplace variable Xi, 1 < z < t’ is 1/e. 
According to properties of the Laplace distribution, \Xi \ follows the exponential distribution with 
rate parameter equal to e. Let Y = ||Ar||i = |Ari| + \X 2 \ + • • • + \Xr\. Then, according to prop¬ 
erties of the exponential distribution, Y follows the Erlang distribution. Specifically, the probability 
distribution function of Y is: 


(r- 1 )! 

For any positive number t such that E[e*'^] exists, we have: 

^r — 1 ^ — ex 


Pr {^Y = x) = 

otVl 


dx 


E[e*^] = r 
^0 


e'x' "e . . t 

— - T^dx = (1 - -) ,t<e 

[r — ly. e 

Moreover, for any real number c, according to Lemma l4^ we have: 


Pr(r > c) = Pr(e*’^ > e*“) < 


Setting f = I and c = m Jim , we obtain; 


1 \ — r 


Pr(y > 


B 


-)< 


( 7 ) 




e2iiiBiiii 


Therefore, we have: 


\\Mp{Q,D) - Q{D)\\i < \\\B\\\i -Y 
V^,Pri\\MpiQ,D) - Qm\i >0< > mWl 


t)^ 


^2|||B|||i 


( 10 ) 


When e > (2|||il|||i (r • In 2 — In 77 )) /g, the above probability is thus bound by 77 . This finishes the 
proof for claim (i) in the theorem. 

(ii) Next we focus on the || • || 00 -norm. Let X denote the same meaning as in the proof of part (i). 
Then, we have; 


\\Mp{Q,D) - Q(B)||oo = \\B ■ X||oo < |||B|||oo • ||A||oo 
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The inequality above holds due to the fact that; ||i?a;||oo < |||^|||oo • ||a;^||oo for any matrix R 
and vector x. Let Y = ||X||oo = max |^ 2 |) • • • j |^r|)- Similar to part (i) of the proof, each 

\Xi\, 1 < i < r follows the exponential distribution with rate e. According to the memoryless 
property of the exponential distribution, we create a chain of variables, as follows: 

Y = maJ,{\X,l\X2\, - ■ ■ AXr\) = Xx=re + + ■ ■ ■ + Xx=e dD 

where each X\=x denotes an independent exponential random variable with rate x. Intuitively, 
X\=re models the distribution of the smallest value among \Xi |, 1 X 2 !, • • •, \Xr\', Xx=(r-i+i)<L: 1 < 
i < r models the difference between the i-th smallest value and the {i — l)-th smallest value among 
\Xi\, \X 2 \, • • •, \Xr\. The sum thus yields the maximum value among |Xi|, |Ai 2 |, • • •, \Xr\. 
Similar to the part (i) of the proof, we further derive: 

E[e*^] = • E[e‘(^^=(’'-i>')].E[e*(^''=')] 

Because E[e*^^=“] = e*“ • ae~°‘^dx = for any f < a, we reach; 


Vf<e,E[e‘^] = 


i=l 


ie — t 


Finally, according to Lemma|4j6] we have the following inequality: 

Pr(r > c) = Pr(e*'^ > e*") 
E[e‘(^)] 


< 


n 


ie — t 


/e 


With the choice of f = | and c = ,,, , 


i=l 

-, we obtain: 

\\XIp{Q,D)-Q{D)\\^ < |||S|||oo- Halloo 
^ Ve, Pr(|lMp(g, D) - QiD)\\oo > 0 < P^ll^lloo > j^) 

=> VC,Pr{\\MpiQ,D) - Q{D)\\^ >0< (HLi jtt) = (OLi T^) / 

When e > ^2|||i?|||oo ln-( ^_q 5 ) ~ /d th® above probability is bounded by p. 


e2lllBlllo 


□ 


4.3. Relaxed Workload Decomposition 

Program |9] is rather difficult to solve, since it contains a non-linear objective and complex con¬ 
straints. To devise a stable numerical solution, we relax the formulation so that BL does not neces¬ 
sarily match W exactly, but within a small error tolerance. To do this, we introduce a new parameter 
7 to bound the difference between W and BL in terms of the Frobenius norm. This leads to the 
following optimization program: 


min -tT{B^B) 
B,L 2 

s.t. \\W-BL\\f < 7 

r 

VjXl^yl <1 


( 12 ) 


The following theorem analyzes the error of LRM with the optimal decomposition obtained by 
solving Program (fT2l i. 
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Theorem 4.8. The expected squared error of MLiiM,e{Q, D) using the optimal decomposi¬ 
tion {B, L) solved from Program t l72l ) is at most 

2tr(B^B)/e^+^Y.^l 

i 

ProOE. When W f BL, there are two sources of error. The first is the added Laplace noise. 
According to Lemmathe error incurred by the Laplace noise is at most 4-‘I’(5)(A(L))^ < 

^Hb^b). 

The second source of the error is due to the difference between W and BL. The incurred expected 
squared error is bounded by: 

{{W - BL)Df{W - BL)D 

n 

< \\W-BL\\j,D^D=\\W-BL\\l.'£x^, 

i=l 

The inequality above is due to the Cauchy-Schwartz inequality. By linearity of expectation, the 
expected squared errors can be simply summed up. This leads to the conclusion of the theorem. 

□ 

While Theorem |4j8] implies the possibility of estimating the optimal 7, it is not practical to im¬ 
plement it directly, because this estimation depends on the data, i.e., In our experiments, we 

test different values of 7, report their relative performance, and describe guidelines for setting the 
appropriate 7 independently of the underlying data. 

5. SOLVING FOR THE OPTIMAL WORKLOAD DECOMPOSITION 

This section solves the relaxed workload decomposition problem defined in Program (fTSl i. This 
program is rather difficult to solve, because it is neither convex nor differntiable. In the following, 
Section lSTI describes an effective and efficient solution, based on the inexact Augmented Lagrangian 
method BConn et al. 19971 ILin et al. 20101 . Section l5^ proves that the proposed solution always 
converges, and analyzes its convergence rate. 

5.1. Solution Based on Augmented Lagrangian Method 

Observe that Program (fT2l i is a constrained optimization problem with a large number of unknowns, 
a non-linear objective and rather complex constraints. Since there is no known analytic solution to 
such a problem, we focus on numerical solutions. Furthermore, Program (fT2]) is difficult to tackle 
even with numerical methods, due to three main challenges. First and foremost, there are a a set of 
non-differentiable constraints Vj |Ly | < 1, which rules out many generic techniques for solv¬ 
ing constrained optimization problems, such as the Lagrange multiplier method, which are limited 
to problems with differentiable constraints. Second, the non-differentiable constraints involve the 
unknown matrix L, whereas the objective function involves another unknown matrix B, whose re¬ 
lationship to Lis rather complex (i.e., in constraint ||kF—LLjli^’ < 7); consequently, it is non-trivial 
to apply specialized methods for handling the non-differentiable constraints. Finally, Program (fTSl i 
is not convex with respect to the unknowns B and L. 

The main idea of the proposed solution is to break down Program (fT2l i into simpler, solvable sub¬ 
problems. Since the most difficult part of Program (fT^ is the existence of the non-differentiable con¬ 
straints I Bij I < 1 , we aim to break down the whole problem into subproblems with only these 

constraints, and an objective function that only involves the unknown L, not B. Then, we use a spe¬ 
cialized technique to solve each of these subproblems. Specifically, we first eliminate the constraint 
||kF — BL\\f < 7 —0 using the augmented Lagrangian method, which runs in multiple iterations, 
each of which solves a subproblem with only the constraints Vj |Ly | < 1. Then, inside each it¬ 
eration, we remove B from the objective function of the subproblem, by alternatively optimizing for 
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ALGORITHM 1: Workload Matrix Decomposition 

1: Initialize = 0 G = 1, fc = 1 

2: while not converged do 
3: while not converged do 

4: ■«— update B using Equation d 

5: run Algorithm[^to update L according to Program ( 115b 

6: Compute t = \\W - B<-'^'> ijj. 

7: if r is sufficiently small or /3 is sufficiently large then 

8: return and 

9: if A: is a multiple of 10 then 

10 : 

11 : (W - 

12: k = k + 1 


B and L. The result are subproblems with only the constraints Vj \Lij | < 1 as well as an objec¬ 
tive function that has only L as unknowns. Each of these subproblems are then solved by applying 
a special solver called Nesterov’s first order optimal gradient method ONesterov 20031 . An impor¬ 
tant optimization is that we apply the inexact augmented Lagrangian method BConn et al. 19971 
ILin et al. 20101 . which does not solve the subproblem exactly in each iteration exactly, leading both 
increased efficiency and stability. 

Algorithm[T]shows the proposed solution for Program (fTSl i. First, we apply the inexact augmented 
Lagrangian method to eliminate the linear constraint ||kE — BL\\f < 7 —0, as follows; we add to 
the objective function (i) a positive penalty item /3 G M and (ii) the Lagrange multiplier tt G 
P and TT are iteratively updated, following the strategy in HConn et al. 19971lLin et al. 20101 . In each 
iteration, the values of /3 and tt are fixed, and the algorithm aims to find values for B and L that 
minimize the following subproblem: 


xmxiJ{B,L,p,F) = ]-\i{B'^B) + {TT,W-BL) + ^\\W-BLfF (13) 

B ,L Z Z 

s-t. yj^\Lij\ < 1 

i 

Next we eliminate unknowns B from the objective function of the above subproblem. Program 
. Observe that this is a bi-convex optimization problem with respect to B and L, meaning that it is 
convex with respect to B (resp. L), once we fix L (resp. B) to a constant. Hence, we solve it by 
alternately optimizing B and L (lines [313 of Algorithm 1). Note that following the inexact Aug¬ 
mented Lagrangian Multiplier methodology, it is not necessary to obtain the exact optimal values of 
B and L, instead, a small number of iterations of the while-loop in lines|4||5]suffices. We first focus 
on optimizing B, treating L as constant. Observe that (•) is convex with respect to B. Hence, the 
optimal B can be obtained by solving ^ = 0. In particular, the gradient with respect to B is: 

^ = B-ttL^ + PBLL^ - PWL^ 
oB 

Solving B from ^ = 0, we obtain: 

B = [PWL^ + ttL^) {PLL'^ + I)~^ (14) 

Next we show how to optimize L with a fixed B. This is equivalent to the following quadratic 
program: 


ACM Transactions on Database Systems, Vol. V, No. N, Article A, Publication date: January YYYY. 















Optimizing Batch Linear Queries under Exact and Approximate Differential Privacy 


A:19 


g{L) = |tr {L^B^BL) - tr ((/3VF + T:f Bl) 

^ (15) 

s.t. Vj^ |Ly | < 1 

i 

The gradient of the objective Q (L) respect to L in (fTSl) can be computed as: 

^ = pB^BL-pB^W - B^tt (16) 

oL 

For all L', L" with Vj I ^ 1) ''(j \^ij I < we have the following inequalities: 

\\g{L')-g{L")\\F ^ 

\\L'-L"\\f 

< 

Therefore, the gradient of g{L) is Lipschitz continuous with parameter a; = /3 • 11 1 12 . 

We employ Nesterov’s first order optimal gradient method BNesterov 20031 to solve the program 
in ( fTSl l. Nesterov’s method has a much faster convergence rate than traditional methods such as the 
subgradient method or naive projected gradient descent. The updating rule in the projected gradient 
method is expressed as follows: 

where t denotes the iteration counter, V{L) denotes the Ci projection operator on any L € 

77 > 0 denotes the appropriate step size. One typical choice for 77 is the inverse of the gradient lips¬ 
chitz constant 1 /w, however, this can be sub-optimal when the gradient lipschitz constant is large. 
One can incooperate Beck et al.’s backtracking line search strategy to further accelerate the con¬ 
vergence of the projected gradient algorithm BBeck and Teboulle 20091 . We adopt this line search 
strategy in our algorithm. 

L is updated by gradient descent while ensuring that the Ci regularized constraint on L is satis¬ 
fied. This is done by the Ci projection operator, formulated as the following optimization problem: 

T’(L)=arg min \\L - L\\%, s.t. < 1, (17) 

I 

We observe that Equation ( fTTI i can be decoupled into n independent Ci regularized sub-problems: 

arg min \\l-l\\l,s.t. y^|ri|<l 

I 

where I = L^^\j = 1, 2, • • • ,n, is the column of L^*'\ Such a projection operator can be 
solved efficiently by Ci projection methods in C>(r log r) time BDuchi et al. 20081 , as described in 
Algorithm|2] The complete algorithm for solving Program (fTSl l is summarized in Algorithm^ 

5.2. Convergence Analysis 

This subsection analyzes the convergence properties of the proposed workload decomposition algo¬ 
rithm. In each iteration. Algorithm [1] solves a sequence of Lagrangian subproblems by optimizing 
B (stepHI and L (step|3 alternatingly. The algorithm stops when a sufficiently small 7 is obtained 
or the penalty parameter (3 is sufficiently large. It suffices to guarantee that L converges to a locally 
optimal solution BLin et al. 20101 1 Wen et al. 2012al I Wen et al. 2012bl . 


WPB'^BL' - PB^BV'Wf 
\W-L"\\f 

\\\PB^B\\\2-\\L' -L"\\f 
\\L'-L"\\f 


\B'^B\\\2 
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ALGORITHM 2: Algorithm for L\ Ball Projection 
1: input: A vector I £ 

2: sort I into v such that vi > V 2 > ■ ■ ■ > Vr 

3: find p = max{i € [r] ■. Vi - I ttfc - > 0} 

4: compute 9 ^ j (I]f=i Vi - l) 

5: output I £ s.t. li = max(li — 9,0),i £ [r] 


ALGORITHM 3: Nesterov’s Projected Gradient Method 

1: input: a(L),|£,L(“> 

2: ^ = r ■ n ■ Lipschitz parameter: = 1 

3: Initializations: = 0,= l,t = 1 

4: while not converged do 

5: Q = ^ + q(LW - 

6: for j = 0 to ■ • ■ do 

7: cj = t/= 5 - iVs 

8: Project U to the feasible set to obtain (i.e., solve Equation ( I17H 

9: if ||S — ||i? < X then 

10: return; 

11 : Define function: X,s([/) = g{S) + {^ ,U - S) + ^\\U - S\\l 

12: ifgjiW) < JL,s(t/)then 

13: break; 

14: SetrW = i±^i±p^ 

15: t = t + 1 

16: return 


In general, penalty methods have the property that when the global (or local) minimizers of the 
subproblem are found, every limit point is a global (or local) minimizer of the original problem 
IlFiacco and McCormick 196^ . This property is preserved by the Augmented Lagrangian Multi¬ 
plier counterparts. Therefore, the proposed solution for the workload decomposition problem con¬ 
verges, whenever the bi-convex optimization subproblem in Program (15.11) converges. Regarding 
the convergence properties of the bi-convex optimization subproblem, past study OBertsekas 19991 
on bi-convex optimization has shown that block coordinate descent is guaranteed to converge to 
the stationary point for strictly convex problems. However, the subproblem in Program (15.11 ) is not 
strictly convex (though it is convex); meanwhile, the subproblem may have multiple optimal so¬ 
lutions, which may cause problems to its convergence. Fortunately, for bi-convex optimizations 
which only involves two blocks, HGrippo and Sciandrone 2000) shows that the strict convexity of 
the subproblem is not required; every limit point of is a stationary point. Accordingly, 

the bi-convex optimization subproblem exhibits nice convergence properties. In the following, we 
formalize and prove the convergence results of the proposed algorithm. 

We first present the first order KKT conditions of the optimization problem in Program (fT2l) . 
Introducing Lagrange multipliers p, £ and tt £ for the inequality constraints 

Vj \Lij I < 1 and linear constraints W = BL respectively, we derive the following KKT condi¬ 
tions of the optimization problem: 
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li> 0 (Non-Negativity) 

r 

W = BL, Vj \Lij\ < 1 (Feasibility) 

i 

B = ■nL^, 0 £ 22i -i?^7r (Optimality) 


(18) 


Vj \^ij \ — 1) = 0 (Complementary Slackness) 


The following theorem establishes the convergence properties of the proposed algorithm, under 
the assumption that the iterates generated by Algorithm[T]exhibit no jumping behavior. Remark that 
the similar condition was used in HWen et al. 2012allWen et al. 2012bll . 

Theorem 5.1. Convergence of Algorithm]^ Let X = {B,L,tt) and be the in¬ 
termediate results of Algorithm\I\after the k-th iteration. Assume that is bounded and 

limfe_).oo(X(^“''^) — =0. Then any accumulation point of satisfies the KKT con¬ 
ditions presented in Equation ( l7iSI ). In other words, whenever }£i converges, it converges to 

a first-order KKT optimal point. 


Proof. Since is the global optimal solution of Program (fTSl) . by the KKT optimal con¬ 
dition, there exist ^ > 0, ^ and such that the following equation holds: 


0 £ 


dg 


+ 

3 


5L(fe+i) 


(19) 


Note that is a convex function with respect to Hence, the KKT conditions are both neces¬ 

sary and sufficient conditions for global optimality. Combining Equations (fTbl l and ( [19] ), we obtain: 




( 20 ) 


n 




j 

We derive the following equations according to the update rule for B (at Line|4|in Algorithm]!]) and 
the Lagrangian multiplier update rule for tt (at Line[TT]in Algorithm]!]), respectively: 


^(fc+i) _ ^(fc) ^ 7rL(fc)'r _ ^(fe) (/3L(fc)i(fc)T’ + ^ (21) 


^(fc+i) _ ^(fe) ^ _p{k+i) _ ^(k+i)j^ik+i)^ ^22) 

Since ^ is bounded according to our assumption, the sequences and 

are also bounded. Hence, limfe_>oo(2f*'^~'’^^ — X^^l) = 0 implies that both sides of Equation ( i20l 

l2ni22l) converge to zero as k approaches infinity. Consequently, 


^ 0, _ ^(fc) 


3p. ■.-B^’^+^^^tt + J^Tj 


dEllL. 


(fc-Hi)i 




( 23 ) 
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where the first limit in Equation (|2^ is used to derive other limits. Therefore, the sequence 
asymptotically satisfies the KKT conditions in Equation (fT^ . This completes the 

proof. □ 

Next we focus on the convergence rate of the proposed algorithm. The following theorem states 
that it converges linearly. 

Theorem 5.2. Convergence Rate of Algorithm^ Let X = {B, L, tt) and ^ be the 

intermediate results of Algorithm\I\after the k-th iteration. Assume that bounded and 

limfc_>.oo(2f^^“''^^ — = 0. Let be the solution obtained after the k-th iteration 

and {B*, L*) be the optimal solution to Program ( 1721 ). we have 

In other words, Algorithm\J\converges to the stationary point linearly. 

Prooe. Let 77^^) denote the solution of the Lagrangian sub-problem in the k^^ iteration. The 
following inequality holds on the sequence of the Lagrangian subproblems: 


< min :r(7?,L,7rW,/3W) 


W = BL, 

ViEi 


1 


< min J{B,L,tt*,/3<^^^) 

~ W = BL, \ ) / 

VjEi \Bij\<l 

1 


= min -txlB^'B) = -tr(B*^B*) 
W = BL, 2 ^ 2 ^ 


(25) 


By the definition of and the inequality above, we derive the following inequality: 
itr(77('=+i)^i7('=+i)) 

= -+/?('=) (PE-- ||7r('=)|||) 




2/3(fc) 

1 

2/3('=) 


r ( fc + l )|| 2 ,_ ||.^( fe )||2 


) 


< itr(77*^7?*) - ^ 

The third equality holds because of the Lagrangian multiplier update rule: 


(26) 
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By the non-negativity of norms, we have; 

= itr(B*^B*) - ^ (||7r(^-+i )\\% - IIf) (27) 
Combining Equations (l26l l and (l27l i. we obtain: 

Summing this equality above over i = 0,1..., fc — 1, we have: 

(tr - tr ) = ||7r ('=)\\% - ||7r (°)\\% (28) 

i^O 

Since is non-decreasing, we have: 


mm 

i=0,l,...,fc-l 


tr 




< 


// 3 ® 


(29) 


By the boundedness of |||, — ||7r*^°) |||,, we complete the proof. □ 

Note that although our convergence proof assumes that each subproblem is solved exactly, this 
is not required in practise, because the inexact augmented Lagrange multipliers method has been 
shown to converge practically as fast as the exact augmented Lagrange multipliers OLin et al. 20101 . 
Meanwhile, inexact augmented Lagrange multipliers require significantly fewer iterations when 
solving the subproblem, leading to much higher efficiency. 

Complexity Analysis: Each update on B in Equation (fT4li takes 0{r^m) time, while each update 
on L consumes 0{r^n) time. Assuming that Algorithm[T]converges to a local minimum within Nin 
inner iterations (at line 3 in Algorithm [T]i and iVo„t outer iterations (line 2 in Algorithm [T]), the 
overall complexity of Algorithm[T]is 0{Nin x Nout x (r^m -(- r^n)). 


6. LRM UNDER (e, 5)-DIFFERENTIAL PRIVACY 

This section extends LRM to (e, i5)-differential privacy. Section 6.1 formulates the workload decom¬ 
position as an optimization program. Section 6.2 analyzes the utility of LRM. Section 6.3 discusses 
the algorithm for solving optimal workload decomposition. 


6.1. Workload Decomposition 

Similar to the case of e-differential privacy described in section @1 LRM decomposes the workload 
matrix W into W = BL. Then, LRM applies the Gaussian mechanism to the intermediate queries 
corresponding to L to enforce (e, ())-differential privacy. Linally, LRM combines the noisy results 
of the intermediate queries according to B, to obtain the results of Q. Lormally, let 0(T) be the £2 

sensitivity of L, i.e., 0(£) = rnaxj . LRM under (e, (5)-differential privacy is defined 

as follows. 


where h{e,S) 


^lrm,{<l,s){Q,D) — B 


^LD + Gau 


e(£) 

h{e,5) 



\/81n(2/5)' 


( 30 ) 
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Let $(i?) be scale of the decomposition as defined in Definition l4.ll i.e., 4>(i?) = j Bfj- The 
following lemma shows that the error of LRM is linear to and quadratic to &{L). 

Lemma 6.1. The expected squared error of (Qi D) with respect to the decomposi¬ 

tion W = BL is 81n(2/(5)<hB0(L)^/e^. 

Proof. According to Equation (1^ . Q{D) — D) = B ^Gau ^ h{i^s ) ) )■ 

expected squared error of LRM is thus Since Bfj and h{e, S) = 

. ^ the error can be rewritten as 8 ln(2/(5)$B0(L)^/e^. □ 

•^8 ln(2/5) 

Therefore, the best decomposition is the one that minimizes <i)B0(L)^. Similar to the case of e- 
differential privacy, the particular value of 0(L) is not important, as stated in the following lemma. 

Lemma 6.2. Given a workload decomposition W = BL, we can always construct another 
decomposition W = B'L' satisfying (i) &{L') = 1 and (ii) (B', L') lead to the same expected 
squared error of tis (B, L). 

The proof is similar to that of Lemma 14.31 and omitted for brevity. Based on Lemma 16.21 we 
formulate the following optimization program for finding the best decomposition for 

min -tr{B"'-B) 

B,L 2 

s.t. W = BL 

Villas 1 


6.2. Utility Anaiysis and Budget Selection 

This subsection analyzes the utility well as the choice of the privacy parameters (e, 

5) given a user-specified utility constraint. We use (^, 77)-usefulness (Definition 14.5b as the utility 
measure. The result is stated in the following theorem. 

Theorem 6.3. Utility of LRM under (e, S)-differential privacy. Given database D and work¬ 
load W, for any ^ > 0 and 0 < rj < I, mechanism 5 ) using the optimal decom¬ 

position W = BL solved from Program m has the following utility guarantees: (i) when 

e > Y^6-ln|- (§ In 3 - In 77 ) 1 1 1 ^ 1112 /^, the output of S') is {^,r])-useful under the || • 112 - 

norm; (ii) when e > y^(61nr — 31n3)(ln2 — ln(5)/77|||i?|||oo/6 ihe output of is 

{^,rf)-useful under the || • \\oo-norm. 

Proof, (i) Let X be the Gaussian noise vector injected to the intermediate results in LRM. 
According to Equation (l30l l. we have: 

\\MLRM,(e,5){Q,D) - Q{D)\\l = \\B{LD + X) - WD\\l = \\B ■ X||i < |||i?|||i ■ ||X||i 

The inequality above is due to the fact that II i?a;|| 2 < |||.R||| 2 ’ ||a:|| 2 , for any matrix i? and vector a:. 
Accordingly, we derive the following: 

\\Mlrmm,s){Q.D) - Q{D)\\l < lllBlIli • ||X||i 

^ V^,Pr(||MiHM.(M)((?>^) - QiD)\\l > e) < Pr(||X||i ■ |||i?|||2 > ^2) 

^ ^^,Pr{\\MLRMAe,s){Q,D) - Q{D)h >0< Pr(||X||i > 
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Next we focus on properties of X. According to the Gaussian mechanism, the elements of X, 
i.e., ATi, ^ 2 , • • • , Xr follow i.i.d. zero-mean Gaussian distribution with scale a = Since the 

decomposition W = BL is solved from Program ISTl we have Q{L) = 1. Thus, cr = = 

V21n(2/^) 

€ 

Let t, c be any positive number, we have: 


Pr(||X||2 > c) = Pr 






= Pr e ‘ 


> 


< 



r iixi| 2 j 

E 

6 


et<’ 


ta'^ ^ J 

where the last inequality holds due to Markov’s inequality. 

Consider the random variable Yi = exp > where t is an arbitrary positive number such that 

E[Fi] exists. According to the probability density function of the Gaussian distribution (Equation 
0 ), we have: 

nY^ = P g[x)ei^)dx = > 2 

Based on the above derivations, and the fact that Xi’s are independent variables, we obtain: 

nLi(Ee^) nLiE[r,] 


Pr(||X||^>c)< 


With the choice of f = 3, c = 


— q ^ ^ 


Pr{\\M,,s{Q,D)-QiD)h>0< 


6 6 6 *^ 

T, and cr = in(2/(5) leads to: 

i 

( t )r/2 3^/2 


6 * 0 ’'^ 


g61n(2/5)|||B|||^ 


When e> \/6 • In | • (| 


ln3 — In 77 ) 11151112 /^, the above probability is bound by 77 . 

(ii) Let X be the Gaussian noise vector injected to the intermediate results as in part (i) of the 
proof. We have: 

1 


mAQ,D) - Qm\io = \\B ■ < ||| 5 |||^ • 11x11^ = IllaSlII 


:^llc 


The above inequality holds due to the fact that ||i?x||oo < 
vector X. Let Z = \\= (max(iAii, • • • uiax{^Xr))^■ We derive: 

mAQ.m-QiD)\\L<\\WB\\\l-\\ix\\l 

^ Ve,Pr(||M,, 5 (Q, 5 ) - Q(5)||L > < Pr(|||a5|||L ■ Z > e) 

^ ve,Pr(||M,,4g, D) - g(5)|U > 0 < PHY > 

By Markov’s inequality, we obtain: 


for any matrix B and 


Pr(2' > 


\\WB\\\i 


-)< 


E[Z] 


\aB\ 


Note that the above bound is tight, even though Chernoff bound can not be applied here. 

Next we derive an upper bound for the expected value of Z. Let Y — ^X. Clearly, Yi, I 2 ; 
are independent, standard normal random variables. Hence, Y^’s (1 < i < r) are i.i.d. Xi variables. 


Y 

, ± r 
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i.e., Chi-square random variables with 1-degree of freedom. The probability density function fi for 
Yi is thus: 

Since the function exp(-) is convex and positive, by Jensen’s inequality, for any t such that E[e*'^] 
exists, we have: 

^mz] < ^ Efmaxe*^^'] < V E[e*^'] (32) 

Meanwhile, for any t < 5 , we have E[e*^«^] = ^e~ ^dx = {1 — 2t)~^ . Combine 

this with Equation (l32l i. we obtain an upper bound of the expected value of Z: 

E[^]<^-^ln(l-2f) 

With the choice of f = we have E[Z] < 3 In r -f | In 3. Since Vj L^j < 1, the sensitivity over 
the batch query workload Q is 1. Since a = vve obtain the following: 

y^,PT{\\M,,s{Q,D)-Q{D)\\^>0 < nZ]-\\\aB\\\l/e 

< ^31nr-f ^ln3^ • lllcrBlll ^/52 

= (^31nr + |ln3) •(21n(2/5)).|||i?|||L/(eer 

When e > y ^(6 In r — 3 In 3) (In 2 — In i5)/77111i?111 00 /C, the above probability is bound by 77 . □ 

6.3. Solving for the Optimal Workload Decomposition 

The optimization program (i.e.. Program OTl i) for workload decomposition under (e, (5)-differential 
privacy is identical to the one under e-differential privacy (Program (|9ll), except that the former uses 
£2 sensitivity in the constraints Vj ^ij — 1 whereas the latter uses £1 sensitivity. Hence, to 
solve Program OTl i. we simply adapt Algorithm[T]by modifying the parts related to these constraints. 

The only major modification of Algorithm [1] lies in the projection step, which now needs to 
projects every column in L onto the £2 ball of radius 1, instead of the £1 unit ball as in Section 0 
Specifically, the £2 ball projection is performed by solving the following optimization program: 

min ||Z-£|||,,s.f. Vj VZ--< 1 (33) 

2 

The above program can be decoupled into n independent C 2 regularized sub-problems: 

arg min \\I-l\\l,s.t. V ^<1 

where I = L^^\j = 1,2,..., n, is the j*^ column of L^*'\ Such a projection can be computed 
max(i ||i|| 2 ) • Therefore, the projection can be computed efficiently in linear time. Finally, by 
adapting the proofs in section 15.21 we can draw the conclusion that the modified Algorithm [T] for 
optimizing workload decomposition for LRM under under (e, 5)-differential privacy also converges 
to the a local KKT optimal point linearly. We omit the complete proofs for brevity. 
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7. EXPERIMENTS 

This section experimentally evaluate the effectiveness of LRM under e- and (e, S)- differential pri¬ 
vacy definitions. For e-differential privacy, we compares LRM against six state-of-the-art meth¬ 
ods: Laplace mechanism (LM) HDwork et al. 2006cl . Privlet (WM) OXiao et al. 20101 . hierarchical 
mechanism (HM) [ |Hay et al. 2010| , exponential smoothing (ESM) OYuan et al. 201211 (an imple¬ 
mentation of the approximate matrix mechanism BLi et al. 2010L described in Appendix lA.ll) . adap¬ 
tive mechanism (AM) BLi and Miklau 20 (another implementation of the approximate matrix 
mechanism BLi et al. 20101 UTand Miklau 20121 . described in Appendix I A. 2b and the exponential 
mechanism with multiplicative weights update (MWEM) BHardt et al. 20121 . whose performance 
depends on the dataset. Eor (e, b)-differential privacy, we compare LRM against WM, HM, ESM, 
AM, and the Gaussian mechanism (GM) [McSherry and Mironov 2009) . 

Implementations: Eor AM, we employ the Python implementation that can be obtained from 
the authors’ website (http://cs.umass.edu/~chaoli). We use the default stopping cri¬ 
terion provided by the authors. Eor MWEM, we used Hardt et al’s C# code listed in the Ap¬ 
pendix of BHardt et al. 2012B ). Note that MWEM needs to tune an additional parameter T which 
denotes the number of iterations in order to ensure its performance. We follow the experimen¬ 
tal setting in BHardt et al. 20121 . Specifically, we choose T G {10,12,14,16} in our experi¬ 
ments and reported the values for the best setting of T in each case (Strictly speaking, such pa¬ 
rameter tuning violates differential privacy; hence, the reported results are in favor on MWEM). 
Eor all remaining methods, we implemented them in Matlab, and published all code online 
(http : / /yuanganzhao . weebly . com/). We performed all experiments on a desktop PC with 
an Intel quad-core 2.50 GHz CPU and 4GBytes RAM. In each experiment, every algorithm is exe¬ 
cuted 20 times and the average performance is reported. 

Datasets: We use four real-world data sets in our experiments | Hay et al. 2010[ IXu et al. 20131 
IHardt et al. 2012B : Search Log, Net Trace, Social Network and UCI Adult. Search Log includes 
search keyword statistics collected from Google Trends and American Online between 2004 and 
2010. Each unit count is the number of appearances of a particular keyword. Social Network con¬ 
tains information about users in a social network, where each unit count is the number of users with 
a specific degree in the social graph. Net Trace is collected from a university intranet, where each 
unit count is the number of TCP packets related to a particular IP address. The total number of 
unit counts in Search Logs, Net Trace and Social Network are 65, 536, 32, 768 and 11, 342 respec¬ 
tively. The UCI Adult data was extracted from the census bureau database in the U.S. Department 
of Commerce, it contains 14 features, among which six are continuous and eight are categorical. 
We use the following strategies to generate the sensitive data with varying domain size n. Eor the 
{Search Log, Net Trace, Social Network} data sets, we transform the original counts into a vector 
of fixed size n (domain size), by merging consecutive counts in order. Eor the UCI Adult data set, 
we only consider the combined {workclass, education, occupation, race} attributes (with their total 
corresponding domain of size {8 x 16 x 14 x 5 = 8960}) and uniformly choose n domains. The 
counting numbers of their corresponding records are used as the domain data. We observed that all 
the data sets {Search Log, Net Trace, Social Network} are dense with their sparsity exactly equals 
to 100%, while the UCI Adult data set is sparse with its sparsity roughly 12% ~ 17%. 

Workloads: We generated four different types of workloads, namely WDiscrete, WRange, 
WMarginal and WRelated. In WDiscrete, for each Wij (i.e., the coefficient of the i-th query on 
the j-th unit count), we set Wij = 1 with probability 0.02 and Wij = — 1 otherwise. In WRange, 
each query qi sums the unit counts in a range [si, t} C [1, n], i.e., Wij = 1 for Si < j < ti, and 
Wij = 0 otherwise. The start and end points Si and ti of each query qi is randomly generated, fol¬ 
lowing the uniform distribution. WMarginal is used in BLi and Miklau 20 which contain queries 
that are uniformly sampled from the set of all 2-way marginals. Eor WRelated, we generate s inde¬ 
pendent linear counting queries (called base queries) with random weights following (0, l)-normal 
distribution. Let A (of size s x n) denote the workload matrix of the s queries. We also generate 
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another matrix C of size m x s in a similar way The workload matrix W is then the product of C 
and A, i.e., the linear combination of base queries according to C. 

Parameters: We test the impact of five parameters in our experiments: 7 , r, n, m and s. 7 is 
the relaxation factor defined in Program (fT2l l. r is the number of intermediate queries in LRM, i.e., 
the number of columns in B (and also the number of rows in L). n is the number of unit counts 
and m is the number of queries in the batch. Finally, s is the number of base queries during the 
generation of WRelated. The ranges and defaults (shown in bold) of the parameters are summarized 
in Table nil Moreover, we test three different values of the privacy budget: e = 1, 0.1 and 0.01. For 
(e, i5)-differential privacy, following OLi and Miklau 20 T^ . we set S = 0.0001. 


Table II. Parameters used In the experiments. 


7 

0.0001,0.001,0.01,0.1,1,10 

r 

{0.8, 1 . 0 , 1 . 2 , 1.4,1.7, 2.1, 2.5,3.0, 3.6} x rank(W) 

n 

128, 256, 512, 1024 , 2048, 4096, 8192 

m 

64,128,256,512,1024 

s (during the generation of WRelated ) 

(0.1, 0.2,0.3, 0.4, 0 . 5 , 0.6, 0.7,0.8,0.9,1.0} x minim, n) 


In the experiments, we measure average squared error and computation time of the methods. 
Specifically, the average squared error is the average squared £2 distance between the exact query 
answers and the noisy answers. In the following, Section lTTl examines the impact of 7 and r, which 
are only used in LRM. The results provide important insights on how to set these two parameters to 
maximize the utility of LRM. Then, Sections [7.2l to l7.5] comr)are LRM against existing methods. 

7.1. Impact of 7 and r on LRM 

In LRM, the relaxation factor 7 controls the difference between BL and W. In our first set of 
experiments, we investigate the impact of 7 on the accuracy and efficiency of LRM. Figure |2] and 
Figure [3]report the performance of LRM with varying values for 7 under e-differential privacy and 
(e, (5)-differential privacy respectively, using the Search Logs dataset. Results on other datasets lead 
to similar conclusions, and are omitted for brevity. 

The results in the Figure |2] and Figure [3 show that when e is relatively low (meaning strong pri¬ 
vacy), the error of LRM is not sensitive to 7 regardless of the workload, for all values of 7 tested in 
the experiments ((10“^ to 10). Only when e reaches 1 does large values of 7 (e.g., 7 > 1) show neg¬ 
ative impact on the performance of LRM. This negative effect is relatively small under e-differential 
privacy; it is more pronounced under (e, i5)-differential privacy. The reason is that the error of LRM 
comes from two sources: the added noise and the difference between the decomposition BL and 
the original workload W. When the privacy requirement is strong (i.e., when e is relatively low, 
or when e-differential privacy is used), the error introduced by inexact decomposition is negligible 
compared to the noise added to satisfy differential privacy. Conversely, with looser privacy require¬ 
ment (high e and (e, (5)-differential privacy definition), the noise level becomes low, and the error 
in decomposition becomes more evident. Nevertheless, when 7 < 0.1, its impact is insignificant in 
all settings. Meanwhile, LRM runs much faster with a larger 7 . Overall, 7 < 0.1 is a safe choice, 
and a larger value of 7 is recommended for applications with strong privacy requirements. In the 
following experiments, we fix 7 to 0 . 01 . 

r is another important parameter in LRM that determines the rank of the matrix BL that ap¬ 
proximates the workload W. r affects both the approximation accuracy and the optimization speed. 
When r is too small, e.g., when r < rank{W), our optimization formulation may fail to find a 
good approximation, leading to suboptimal accuracy for the query batch. On the other hand, an 
overly large r leads to poor efficiency, as the search space expands dramatically. We thus test LRM 
with varying r, by controlling the ratio of r to the actual rank rank{W), on the Search Log dataset. 
We record the average squared error and running time of LRM for all the workloads under e and (e, 
b)-differential privacy, and report them in Figure |4] and Figure Irrespectively. 
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Fig. 2. Effect of relaxation parameter 7 on Search Logs under e-differential privacy 



Fig. 3. Effect of relaxation parameter 7 on Search Logs under (e, 5)-differential privacy 
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Fig. 4. Effect of r on Search Logs under e-differential privacy 
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Fig. 5. Effect of r on Search Logs under (e, 5)-differential privacy 
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There are several important observations in Figure |4] and Figure |3 First, a value of r below 
rankiW) leads to far worse accuracy (up to two orders of magnitude) compared to settings with 
higher values of r. Second, the performance of LRM becomes stable when r exceeds 1.2- rank{W) 
for e-differential privacy, and 1.0 • rank{W) for (e, (5)-differential privacy. This is because the opti¬ 
mization formulation has enough freedom to find the optimal decomposition when r > rankiW). 
For (e, (5)-differential privacy, this result is expected, because any decomposition W = BL with 
r > rank{W) can be transformed into a decomposition B'L' with r = rank(W), by projecting 
the columns of L and the rows of B onto the range of L, which does not affect the £ 2 -sensitivity of 
B. Finally, the amount of computations for workload decomposition increases linearly with r (note 
that both axes are in logarithmic scale). Thus, to balance the efficiency and effectiveness of LRM, 
a good value for r is between rankiW) and 1.2 • rankiW). In subsequent experiments, we set 
r = 1.2 • rank{W) and r = 1.0 • rank{W) for e and (e, b)-differential privacy, respectively. 


ACM Transactions on Database Systems, Vol. V, No. N, Ailicle A, Publication date: January YYYY. 







































































































































A;30 


G. Yuan et al. 




Fig. 6. Effect of domain size n on workload WDiscrete under e-differential privacy with e = 0.1 



Fig. 7. Effect of domain size n on workload WRange under e-differential privacy with e = 0.1 



Fig. 8. Effect of domain size n on workload WMarginal under e-differential privacy with e = 0.1 



Fig. 9. Effect of domain size n on workload WRelated under e-differential privacy with e = 0.1 

7.2. Impact of Varying Domain Size n 

We now evaluate the accuracy performance of all mechanisms with varying domain size n. We 
perform all experiments with e = 0.1, since the specific value of e has negligible impact on the 
relative performance of different mechanisms. For e-differential privacy, we report the results of 
all mechanisms on the 4 different workloads in Figures |6] |2l [8] and |9] respectively. On workloads 
WMarginal and WRelated, the performance of AM and ESM is comparable to the naive Laplace 
mechanism, and significantly worse than the other methods, sometimes by more than an order of 
magnitude. This is mainly because the £2 approximation used by AM and ESM does not lead to a 
good optimization of the actual objective function formulated using Ci sensitivity. On WDiscrete, 
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Fig. 10. Effect of domain size n on workload WDiscrete under (e, (5)-differential privacy with e = 0.1 and 5 = 0.0001 




Fig. 12. Effect of domain size n on workload WMarginal under (e, <5)-differential privacy with e = 0.1 and <5 = 0.0001 



Fig. 13. Effect of domain size n on workload WRelated under (e, (5)-differential privacy with e = 0.1 and <5 = 0.0001 

the Laplace mechanism outperforms all other mechanisms when the data is non-sparse and domain 
size is relatively small. This is in part due to the fact that the queries in WDiscrete are generally 
independent when m> n. Since the other mechanisms do not gain from correlations among queries, 
Laplace mechanism is optimal in such a situation. Whereas all other data-independent mechanisms 
incur an error linear to the domain size n, LRM’s error stops increasing when the domain size 
reaches 512. This is because LRM’s error rate depends on the rank of the workload matrix W, which 
is no larger than min(m, n). This explains the excellent performance of LRM in larger domains. On 
WRange, the errors of WM and HM are smaller than that of the Laplace mechanism when the 
domain size is no smaller than 512. Moreover, WM and HM perform better on WRange than on 
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the others workloads, since they are designed to optimize mainly for range queries. Nonetheless, 
LRM’s performance is significantly better than any of them, since it fully utilizes the correlations 
between the range queries on large domains. On WMarginal and WRelated, LRM achieves the 
best performance in all settings. The performance gap between LRM and other methods is over 
two orders of magnitude when the domain size reaches 8192. Since WRelated naturally leads to a 
low rank workload matrix W, this result verifies LRM’s vast benefit from exploiting the low-rank 
property of the workload. Finally, we observe some interesting behaviors of the data-dependent 
method MWEM. The error incurred by MWEM does not scale well with the domain size n on non- 
sparse data sets. Moreover, MWEM performs comparably to LRM on Search Logs and Net Trace 
when the n is very large (n > 4096). However, the performance of MWEM is rather unstable; it 
incurs much larger error than LRM on Social Network and UCl Adult, in some cases by more than 
two order of magnitude. 

Regarding (e, b)-differential privacy, we report the accuracy of all methods in Eigures [TOl (TT] 
[T2]and[T3] LRM obtains the best performance in all settings, especially when n is large. Its im¬ 
provement over the naive Gaussian mechanism is over two orders of magnitude. AM and ESM have 
similar accuracy. Eor range queries, the performance of ESM and AM is comparable to that of WM 
and HM, which are optimized for range counts. However, the accuracy of AM and ESM is rather 
unstable on workloads WRange and WMarginal. Eor ESM, this instability is caused by numerical er¬ 
rors in the matrix inverse operations, which can be high when the final solution matrix is low-rank. 
Eor AM, the problem is with its post-processing step, which gives approximation solutions with 
unstable quality. The performance of LRM, on the other hand, is consistently good in all settings. 

7.3. Impact of Number of Queries m 

In this subsection, we test the impact of the query set cardinality m on the performance of the 
mechanisms. We mainly focus on settings when the number of queries m is no larger than the 
domain size n. Eor e-differential privacy, the accuracy results are reported in Eigures fT4l[T5l[T6l and 
ITT] On WRange and WMarginal, LRM outperforms all other mechanisms, when m is significantly 
smaller than n. As m grows, the performance of all mechanisms on WRange tends to converge. 
The degeneration in performance of LRM is due to the lack of low rank property when the batch 
contains too many random range queries. When m is no less than 256, both the WM and HM 
achieve comparable accuracy to LRM, since they are optimized for range queries. On WDiscrete, 
MWEM is comparable to LRM on UCI Adult data set, one possible reason is that MWEM can 
make use of the sparsity of the data on WDiscrete workload. On WRelated workload, the accuracy 
of LRM is dramatically higher than the other methods, for all values of m. This is because the rank 
of the WRelated workload is fixed to s, regardless of the number of queries. Einally, we observe that 
on WDiscrete and WRange, while the performance of other mechanisms does not differ much from 
data to data, the data-dependent method MWEM generally performs better on the UCI Adult dataset 
compared to on other datasets, due to the high sparsity of UCI Adult. 

Eor (e, (5)-differential privacy, we report the results in Eigures [18] [T9| |20| and |2T] We have the 
following observations from these results. On WDiscrete, WRange and WRelated workload, WM 
and HM improve upon the naive Gaussian mechanism; however, on WMarginal, WM and HM 
incur higher errors than GM. AM and ESM again exhibit similar performance, which is often better 
than that of WM, HM, and GM. LRM consistently outperforms its competitors in all test cases. 

7.4. Impact of Varying Query Rank s 

The previous experiments demonstrate LRM’s substantial performance advantages when the work¬ 
load matrix has low rank. In this set of experiments, we manually control the rank of workload W to 
verify this observation. Recall that the parameter s determines the size of the matrix Cmxs and the 
size of the matrix Asxn during the generation of the WRelated workload. When C and A contain 
only independent rows/columns, s is exactly the rank of the workload matrix W = CA. In Eigure 
I22landl2^ we vary s from 0.1 x min(m, n) to 1 x min(m, n). 
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Fig. 14. Effect of number of queries m on workload WDiscrete under e-differential privacy with e = 0.1 




Fig. 15. Effect of number of queries m on workload WRange under e-differential privacy with e = 0.1 




Fig. 16. Effect of number of queries m on workload WMarginal under e-differential privacy with e = 0.1 



Fig. 17. Effect of number of queries m on workload WRelated under e-differential privacy with e = 0.1 


For e-differential privacy, LRM outperforms all other methods by at least one order of magnitude 
when s is low. With increasing s, the performance gap gradually closes. This phenomenon confirms 
that the low rank property is the main reason behind LRM’s advantages. For (e, (5)-differential 
privacy, LRM also gives the best performance in all test cases; its performance advantage decreases 
with s, though at a much slower rate compared to the case of e-differential privacy. 

7.5. Scalability of the Low-Rank Mechanism 

Finally, we demonstrate the efficiency and scalability of LRM under e- and (e, (5)-differential privacy. 
The running time of LRM is dominated by the optimization module that solves the best workload 
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Fig. 18. Effect of number of queries m on workload WDiscrete under (e, (5)-differential privacy with e = 0.1 and S = 
0.0001 




Fig. 19. Effect of number of queries m on workload under (e, 5)-differential privacy with e = 0.1and(5 = 0.0001 



(a) Search Logs 



(b) Net Trace (c) Social Network (d) UCl Adult 


Fig. 20. Effect of number of queries m on workload WMarginal under (e, 5)-differential privacy with e = 0.1 and 
6 = 0.0001 



(a) Search Logs 



(b) Net Trace 




(c) Social Network (d) UCl Adult 


Fig. 21. Effect of number of queries m on workload WRelated under (e, 5)-differential privacy with e = 0.1 and <5 = 
0.0001 

decomposition, which is independent of the dataset. In Figure|24]and Figure|25] we vary the domain 
size n from 128 to 8192 and the number of queries m from 64 to 256, respectively, and report the 
total running time of LRM for the 4 different types of workloads in our experiments. LRM scales 
roughly linearly with the domain size n and the number of queries m (note that both axes are in log¬ 
arithmic scale). Moreover, we observe that for workload WRelated, LRM runs faster when the rank 
s of the workload is lower, given the same values of n and m. LRM under (e, (5)-differential privacy 
is slightly more efficient than under e-differential privacy. This is expected, since we set a smaller 
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Fig. 22. Effect of parameter s with under e-differential privacy with e = 0.1 







(a) on workload WDiscrete (b) on workload WRange (c) on workload WMarginal (d) on workload WRelated 
Fig. 24. Scalability of LRM under e-differential privacy 



Fig. 25. Scalability of LRM under (e, 5)-differential privacy 

value of r for (e, (5)-differential privacy. In all settings, LRM always terminates within 20 minutes for 
each experiment. In practice, this computation time pays off as LRM achieves significantly higher 
accuracy than existing methods. 

8. CONCLUSIONS AND FUTURE WORK 

This paper presents the low rank mechanism (LRM), an optimization framework that minimizes 
the overall error of the results for a batch of linear queries under differential privacy. The pro¬ 
posed method is the first practical method for a large number of linear queries, with an efficient and 
effective implementation using well established optimization techniques. Experiments show that 
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LRM significantly outperforms other state-of-the-art differentially private query processing mech¬ 
anisms, often by orders of magnitude. The current design of LRM focuses on exploiting the cor¬ 
relations between different queries. One interesting direction for future work is to f urther optimize 

LRM by utilizing also the correlation s between data values, e.g., as is done in IXu et al. 20131 
[Rastogi and Nath 2010[Eet al. 20lTI . 
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A. IMPLEMENTATION OF THE APPROXIMATE MATRIX MECHANISM 

Li et al. OLi et al. 20101 describes two implementations of the Matrix Mechanism, which optimizes 
the accuracy of a batch of linear counting queries under e-differential privacy. The first directly 
solves the optimization program of the matrix mechanism which can be formulated as follows: 

^mm ^ Pll?,ootr (WA^A^^W^) (34) 

where A^ denotes the pseudo-inverse of matrix A, and || A|| i_oo is the maximum £i norm of column 
vectors of A. It is shown that this problem can be formulated as a semidefinite program with rank 
constraint and solved by a sequence of semidefinite programs. However, it incurs high computa¬ 
tional overhead, which is prohibitively expensive even for moderate-sized workload. The second 
implementation solves an approximate version of Program (l34l i. as follows: 

^min^ \\A\\l^tr {WA^A^'^W'^) (35) 

where ||A||2 oo is the maximum £2 norm of column vectors of A. Under e-differential privacy. 
Program (051 is essentially the £2 approximate of the original matrix mechanism formulation. The 
solution to Program (l35l l presented in OLi et al. 20101 . however, is rather complicated, and incurs 
high computational costs. In the following two subsections, we describe two implementation of the 
approximate matrix mechanism, the exponential smoothing mechanism (ESM) OYuan et al. 20121 
and the adaptive mechanism (AM) OLi and Miklau 201% for solving Program ( l35l l. 

A.1. Exponential Smoothing Mechanism 

In this subsection, we present a simpler and more efficient solution, referred to as the exponential 
smoothing mechanism (ESM), based on the methodology of exponential smoothing. Observe that 
IIAII2 00 = max(diag(A^A)|fl and A)~^ = (AA"A)^ (A has full column rank). Let M = 
we reformulate Program (l35l l as the following positive definite optimization problem: 

min G'(M) = max(diag(M))tr(IUM“^IU^) s.t. M >-0 

A is given by A = y/XivivJ, where \i,Vi are the Ah eigenvalue and eigenvector of M, 

respectively. Calculating the second term tr{WM~^W'^) is relatively straightforward. Since it is 
smooth, its gradient can be computed as —M~^W'^WM~^. However, calculating the first term 
max(diag(M)) is harder since it is non-smooth. Fortunately, inspired by | |d’ Aspremont et al. 2007 1, 
we can still use a logarithmic and exponential function to approximate this term. 

Approximate the maximum positive number: Since M is positive definite, v = diag(M) > 0. 
we let /i > 0 be a sufficient small parameter and define: 


f^(v) = /rlog^ (^exp (36) 

We then have max(u) < (v) < max(u) + plogn. The gradient of the objective function in 

Equation (l36l) with respect to v can be computed as: 


( Uj —maxfu) \ 
M ) 


exp 


(t) 




Vi 


(37) 


^We use the Matlab notations in this paper. When A is a matrix, diag{A) denotes a column vector formed from the main 
diagonal of A, when A is a vector, diag(A) denotes a diagonal matrix with A in the main diagonal entries. Moreover, 
max(-) retrieves the largest element of an an'ay. 


ACM Transactions on Database Systems, Vol. V, No. N, Article A, Publication date: January YYYY. 


















A:40 


G. Yuan et al. 


Since the second order hessian matrix of the objective function in Equation (l3^ can be computed 
as; 

d^f diag(exp(^)) exp(;^) exp(^)^ ^ ^ 

f.Er(=p(7f)) p(E"Ha)))' 

we have the upper bound of the spectral norm of the hessian: 111 ■§-^ 11 12 = 11 ~ T| 1 12 < 11 |S| 1 12 + 

IIITIII 2 < i + i = |. Therefore, the gradient of ffj,{v) is Lipschitz continuous with parameter 
cu = 1 . If we set /r = this becomes a uniform e-approximation of max(u) with a Lipschitz 
continuous gradient with constant oj = ^ = In our experiments, we use /i = 

To mitigate the problems with large numbers, using the property of the logarithmic and exponen¬ 
tial functions, we can rewrite Equation (l36l l and Equation dJTl i as; 

f^{v) = max(u) -b /ilog ^ exp 



dvi 



Vi 


By the chain rule of differentiation in calculus, the gradient of G{M) can be computed as: 
dC d f 

— = diag{^) ■ tr (WM-^W^) + f^{v) ■ 

Here diag{^) denotes a diagonal matrix with ^ G M" as the main diagonal entries. This 
formulation allows us to run the non-monotone spectral projected gradient descent algorithm 
I Birgin et al. 2000) on the cone of positive semidefiniteness. We use eigenvalue decomposition to 
trim the negative eigenvalues to maintain positive semidefiniteness of M, and iteratively improve the 
result. After the algorithm terminates, we return the final M as the optimal solution to the program. 


A.2. Adaptive Mechanism 

In this subsection, we briefly review the adaptive mechanism (AM) proposed in 
OLi and Miklau 20121 . a heuristic solution for the problem in Program (ITST i. AM considers 
the following optimization problem: 


” (fi 

min s.f. (Q 0 (5)(A 0 A) < Im (38) 

a.- 

i—1 ^ 

where Q is from the singular value decomposition of the workload matrix W = QDP with 
Q G G G and d = diag{D) G R", i.e., the diagonal values of D. 

Eurthermore, 0 is the Hadamard (entry-wise) product, is a column vector of all entries equal to 
one. AM then computes the strategy matrix A by 

A = Qdiag(A) G R™^” (39) 


where diag{\) is a diagonal matrix with A as its diagonal values. 

The optimization problem in (l40l i is non-convex since it contains quadratic term both in the ob¬ 
jective and the constraint. By changing variable to A 0 A = u, we have the following equivalent 
optimization problem: 


min , s.t. (Q 0 Q)u < 1^, u > 0. 

i—1 


(40) 
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By introducing an auxiliary variable v € R", the optimization above can be reformulated as the 
following semidefinite program: 


n 

min Vid3, s.t. {QQ)Q)u<\^ 

uGR”,ueR" ^ -1./ - 

i=l 


Ui 1 
1 Vi 


^ 0, Vi € [n] 


(41) 


which can be solved by off-the-shelf interior-point solvers. 


ALGORITHM 4: Adaptive Mechanism for Approximately Solving Problem OH 


1: Input: workload matrix W G 

2: Compute the SVD decomposition W = QDP to obtain Q G R'^x" _ diag{D)£ R". 

3: Solve the semidefinite program in Equation 1411 and obtain u. 

4: Compute A' = Qdiag{^/u) G R"*xn _ (iiag{^xaa.yi{o)ln — o) G where 

Oi — |1A'||2, i = 1, ■■■n, o G R". 

5: Output the strategy matrix A: 


A = 





The complete AM algorithm is summarized in Algorithm |4l Given a workload matrix W, AM 
automatically selects a different set of “eigen-queries” Q and use a nonnegative combination of Q 
to compute the strategy matrix A with respect to the workload matrix. First, in Step 2 the algorithm 
performs the SVD decomposition of W to derive the eigen-queries Q. Based on the eigen-queries Q, 
AM aims to find the optimal linear combination A(A > 0) with A = by solving the semidefinite 
program in Step 3. In Step 4, the matrix A' that is constructed is a candidate strategy but may 
have one or more columns whose norm is less than the sensitivity. In this case, AM adds queries 
or completes columns in order to further reduce the expected error without raising the sensitivity. 
Essentially AM searches over a reduced subspace of A. Hence, the candidate strategy matrix A' 
solved from the optimization problem in (iTSl l does not guarantee to be the optimal strategy since it 
is limited to a weighted nonnegative combination of the fixed eigen-queries Q in Equation (|39] |. 

B. ASYMPTOTIC ERROR BOUNDS FOR LRM 
B.1. LRM Error Bounds under e-Differential Privacy 

In this subsection, we prove the lower bound and upper bound of the error incurred by the optimal 
workload decomposition solved from Program (|9]l, and analyze the gap between the two bounds. 
Eirst, we establish an error upper bound for LRM in the following lemma. 

Lemma B.l. Error upper bound under e-differential privacy. Given a workload matrix W 
of rank s with singular values {Ai,..., As}, an upper bound of the expected squared error of 
MLRM,eiQ, D) w.r.t. the optimal decomposition W = B*L* is 2 

Proof. Consider the naive method NOD, which can be considered as a special case of LRM by 
setting B = W and L = I (i.e., identity matrix). Clearly, A(E) = 1. According to Lemma lA^ the 
expected squared error of this decomposition is: 


2$(i3)A(L)Ve" = 2\\W\\l/e^ = 2 ^ Xl/e^ 

fc=l 


We reach the conclusion of the lemma. □ 
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Next we derive a lower bound on the squared error for linear counting queries under e-differential 
privacy, using geometric analysis under orthogonal projection BHardt and Talwar 201011 . To do this, 
we first present the following lemma, which is used later in our geometric analysis. 

Lemma B.2. For all orthogonal V € we have the following inequality: 

YoMVB^) > Vol(B|) • n-5 

where Vol(i3|) denotes the volume of unit Euclidean ball, and Yo\{V B^) denotes the volume of 
unit ball of the E\ norm on R" after the orthogonal transformation under V. 

Proof. By Cauchy-Schwarz inequality we have ||a;||i < Y^||a^|| for all x G R", therefore, the 
n-dimensional £i ball contains an £2 ball of radius n~ 2 , i.e. D n~'^ B^. Given an orthogonal 
transformation V, we obtain VB^ D 2 Moreover, because the orthogonal projection of a 
Euclidean ball is a lower-dimensional Euclidean ball of the same radius, it holds that n~^VBlf = 
2 i3|. Therefore, the volume of V i?" is bounded from below by; 

Vol(VB^) > Vol(n-iB^) 

= Vol(B^)-n-^. 


□ 

We are now ready to prove the error lower bound of LRM. 

Lemma B.3. Error Lower Bound under e-differential privacy. Given a workload matrix W 
of rank s with singular values { Ai,..., As}, the expected squared error of any e-differential privacy 
mechanism is at least 



Proof. Corollary 3.4 in OHardt and Talwar 20101 proves that any e-differential privacy mecha¬ 
nism for linear counting queries incurs expected squared error no less than:0 

n (k^ {Yol{PWB^)f^'" /e^) 

In the formula above, is the £i-unit ball. Vo\{PWB^) is the volume of the unit ball af¬ 
ter the linear transformation PW, in which P is any orthogonal linear transformation matrix 
from R" I— R®. To prove the lemma, we construct an orthogonal transformation P = U'^, 
where U is obtained form the SVD decomposition of W {W = UT,V). According to properties 
of SVD decomposition, and VV'^ are identity matrices. Thus, we have Vol{PWB^) = 

YoljPUVV'^YV B^) = Yol{V{V^^V)B ^) = Vol(EB”) OLi The last equality holds due 
to Lemma 7.5 in BHardt and Talwar 20101 . Consider the the convex body VB^. By Lemma 1531 
it has a lower bound Vol(i3|) • (n“^). Note that Vol(i?|) can be computed using the Gamma 

function BBall 19971 : Yii+s/2) • Using the Stirling’s formula, we know that r(l -|- s/2) is roughly 

•\/^e“®/^(s/2)®/^+^/^, so that Vol(i?|) is roughly (^) ^. Therefore, the lower bound can be 

computed as; H 11^=1 ■ We thus reach the conclusion of the lemma. □ 

Next we compare the error upper and lower bounds. The analysis involves a matrix-theory con¬ 
cept called the generalized condition number. 


^ [Hardt and Talwar 20 101 used absolute eiTors, from which which we derived the squai'ed errors. 
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Definition B.4. Generalized condition number. Given a workload matrix W, the generalized 
condition number k{W) of W defined as the product of the spectral norm of W and that of its 
pseudo-inverse or equivalently, the ratio between the largest singular value of W to the nonzero 
smallest [Chen and Dongarra 2005 [[Beltran 201 II . 

«W=ll|W^II|2-|l|VFt|||2 = ^ 

Note that we always have k{W) > 1. 

Theorem B.5 . When s > 5, the gap between the upper and lower bounds of the error incurred 
by mechanism MLft,M,e{Q, D) with the optimal decomposition W = B*L* is O ((rc(Vk))^ j). 

Proof. The theorem is established by comparing the upper and lower bounds in Lemmas ED 
and lB.3l as follows. 


2ELi 




< 


2ELiA? 


2/s 


< 


< 


2nsXl 


2nK{W)‘^ 


The last inequality holds due to the fact that s! < (|)^ when s > 5. Note that all the inequalities 
above are tight, and the equalities hold when k(W) = 1, i.e. Ai = A 2 = ... = Ag. □ 


From the theorem above, we draw the following interesting observations, (i) When the rank of the 
matrix is low (i.e., s is small) and the batch queries are highly correlated {k{W) 1), then the ratio 

of the upper bound to the lower bound is large, meaning that LRM can potentially achieve lower 
error than NOD. (ii) Conversely, when the rank of the matrix is full rank (s —?► n and n < m) and 
the batch queries are almost random or independent {k{W) —^ 1), then the achievable error rate of 
LRM converges to the upper error bound obtained by NOD. Therefore, in this situation, NOD might 
be good enough and no sophisticated algorithm is needed, which is validated by the experimental 
results in Section|73]l. These results are consistent with the work of BGhosh et al. 2012ll , who show 
that Laplace mechanism is optimal in a strong sense when answering a single linear query. 


B.2. LRM Error Bounds under (e, ^)-Differential Privacy 

We first derive an upper bound for the error of LRM. Unlike the case of e-differential privacy, we 
have a tighter error upper bound than that obtained by naive methods. We introduce the concept of p- 
coherence of a matrix, which is similar to /x-coherence OCandes and Recht 20091 and C-coherence 
OHardt and Roth 20121 of a matrix in the low-rank optimization literature. 

Definition B.6. p-coherence of a matrix. Given a matrix W with its SVD decomposition that 
W = UEV, where U € E G U G We say the matrix W is p-coherent if 

p{W) = max IIUII 2 , i = 1, ..., n 

i 

where Vi is the i-th column of V. Note that we have 0 < p{W) < 1. 
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Lemma B.7. Error Upper Bound under (e, 5)-differentialprivacy. Given a workload matrix 
W of rank s with singular values {Ai,..., As}, an upper bound of the expected squared error of 
MLRM,{e,s){Qj D) w.r.t. the optimal decomposition W = B* L* is {p{W))‘^ J2k=i ^)^- 

Proof. To prove the lemma, we perform SVD decomposition of W, obtaining W = WSV. 
Then, we build a decomposition B = p{W)UY^ and L = V. This is a valid decomposition of 

W, because BL = p{W)UY.-^V = UEV = W. 

Next we prove that A(L) = 1. According to properties of the SVD transformation, column 
vectors in V are orthogonal vectors; hence, for every column Vj in V, we have \\Vj II2 < P{W). 

Therefore, 0(L) = max^- = max^ ^|| V, ||2 = 1. 

The expected squared error of this decomposition is then bounded by: 


$(B) = tr{B'^B)/h{e,Sy 

= tiiip{W)UEf{p{W)UE))/h{e,Sf 
= p{Wf\x{Y.'^U'^UT.))/h{c,5f 

S 

= p{WfY.^l/h{e,5f 

fc=i 

We thus reach the conclusion of the lemma. 


□ 


Note that since p{W) < 1, the above error bound is no worse than the error obtained by NOD. 
Meanwhile, the proof essentially describes another simple solution whose accuracy is no worse than 
NOD. 

We now focus on the error lower bound of LRM under (e, (5)-differential privacy. This has already 
been studied in OLi and Miklau 20T3l . and we summarize their results with our notations in the 
following lemma. 

Lemma B.8. Error Lower Bound under (e, 5)-differential privacy KLi and Miklau 20131 . 

Given a workload matrix W of rank s with singular values {Ai,..., Ag}, the expected squared 
error of D) w.r.t. the optimal decomposition W = B*L* is at least 

1 

n/i(e, Sy 

The proof of the above result in OLi and Miklau 20T3l is rather complicated. In the following we 
provide a simple proof. 



Proof. 


min 

W = BL, 


1 




> 


1 


nh{e,Sy w=BL 

^ (llw^lU)^ 


l-WBWl 


nh{e, Sy 
1 

nh{e, Sy 


^ A, 


The first inequality is due to (SI Note that this inequality above is tight, and the 

equality holds when every column of L lies on the surface of the unit ball. The first equality is due 
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to the variational formulation of nuclear norm (see, e.g., HSrebro et al. 2004ll ') that 
IIM^IU = min ||L||f • ||5||f, s.t. W = BL. 

B ,L 

We thus reach the conclusion of the lemma. □ 

We next compare the error upper bound and the error lower bound for LRM under (e, S)- 
differential privacy. 

Theorem B.9. The ratio between the error upper and lower bounds of mechanism 
MLRM,{e,s){Qj D) with the optimal decomposition W = B* L* is bounded by O ((k(W^))^ 7 )- 

Proof. _ 

We compare the upper and lower bounds in lB.7l and lB.81 as follows. 


^ sxjpjwr 

- 

= {^{W)p{W)f- 
s 

We thus reach the conclusion of the theorem. □ 


The above theorem leads to similar conclusions as in the case of e-differential privacy, except that 
here we compare LRM with an improved version of NOD described in the proof of Lemma [RT] 
Meanwhile, the above ratio also involves an additional parameter p, i.e., the coherence number of 
the workload matrix. 
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